wikiqa 数据集

大小: 6.77MB

文件类型: .zip

金币: 2

下载: 0 次

发布日期: 2023-11-02
语言: 其他
标签: NLP

高速下载

资源简介

We describe the WikiQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WikiQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WikiQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WikiQA dataset.

资源截图

小图大图

代码片段和文件信息

import os sys


def get_prf（fref fpred thre=0.1）:
    “““
    get Q-A level and Q level precision recall fmeasure
    “““
    ref pred = [] []
    qref qpred_idx = [] []
    preqpos = ““
    qflag = False
    with open（fref “rb“） as f:
        for line in f:
            parts = line.strip（）.split（）
            qpos l = parts[0] int（parts[3]）
            if qpos != preqpos and preqpos != ““:
                if qflag: qref.append（1）
                else: qref.append（0）
                qflag = False
            preqpos = qpos
            ref.append（l）
            if l == 1: qflag = True
        if qflag: qref.append（1）
        else: qref.append（0）

    preqpos = ““
    maxval = 0.0
    maxidx = -1
    with open（fpred “rb“） as f:
        for i line in enumerate（f.readlines（））:
            parts = line.strip（）.split（）
            qpos scr = parts[0] float（parts[4]）
            if qpos != preqpos and preqpos != ““:
                qpred_idx.append（maxidx）
                maxval = 0.0
                maxidx = -1
            preqpos = qpos
            if scr >= thre: pred.append（1）
            else: pred.append（0）
            if scr > maxval:
                maxidx = i
                maxval = scr
        qpred_idx.append（maxidx）

    match_cnt ref_cnt pred_cnt = 0.0 0.0 0.0
    for r p in zip（ref pred）:
        if r == 1: ref_cnt += 1.0
        if p == 1: pred_cnt += 1.0
        if r == 1 and p == 1: match_cnt += 1.0
    prec reca = match_cnt / pred_cnt match_cnt / ref_cnt

    match_cnt ref_cnt pred_cnt = 0.0 0.0 0.0
    for r pidx in zip（qref qpred_idx）:
        if r == 1: ref_cnt += 1.0
        if pred[pidx] >= thre: pred_cnt += 1.0
        if r == 1 and pred[pidx] >= thre and ref[pidx] == 1: match_cnt += 1.0
    qprec qreca = match_cnt / pred_cnt match_cnt / ref_cnt
    
    qmatch_cnt qcnt = 0.0 0.0
    for r pidx in zip（qref qpred_idx）:
        qcnt += 1.0
        if r == 1 and pred[pidx] >= thre and ref[pidx] == 1: qmatch_cnt += 1.0
        elif r == 0 and pred[pidx] < thre: qmatch_cnt += 1.0
    qacc = qmatch_cnt / qcnt

    return [prec reca 2.0*prec*reca/（prec+reca） qprec qreca 2.0*qprec*qreca/（qprec+qreca） qacc]



if __name__ == “__main__“:
    refname predname = sys.argv[1] sys.argv[2]
    thre = 0.11
    if len（sys.argv） > 3:
        thre = float（sys.argv[3]）
    results = get_prf（refname predname thre=thre）
	
    print “WikiQA Question Triggering: precision = %.4f recall = %.4f F1 = %.4f“ %（results[3] results[4] results[5]）

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----
     目录           0  2015-08-25 12:30  WikiQACorpus\emnlp-table\
     文件       75927  2015-08-25 12:30  WikiQACorpus\emnlp-table\WikiQA.CNN.dev.rank
     文件      172794  2015-08-25 12:30  WikiQACorpus\emnlp-table\WikiQA.CNN.test.rank
     文件       56909  2015-08-25 12:30  WikiQACorpus\emnlp-table\WikiQA.CNN-Cnt.dev.rank
     文件      129982  2015-08-25 12:30  WikiQACorpus\emnlp-table\WikiQA.CNN-Cnt.test.rank
     文件        2526  2015-08-25 12:30  WikiQACorpus\eval.py
     文件      791066  2013-11-15 12:51  WikiQACorpus\Guidelines_Phase1.pdf
     文件      356867  2013-12-10 14:16  WikiQACorpus\Guidelines_Phase2.pdf
     文件     6269938  2015-08-25 12:30  WikiQACorpus\WikiQA.tsv
     文件       26846  2015-08-25 12:30  WikiQACorpus\WikiQA-dev.ref
     文件      577103  2015-08-25 12:30  WikiQACorpus\WikiQA-dev.tsv
     文件      483564  2015-08-25 12:30  WikiQACorpus\WikiQA-dev.txt
     文件       11188  2015-08-25 12:30  WikiQACorpus\WikiQA-dev-filtered.ref
     文件      541172  2015-08-25 15:06  WikiQACorpus\WikiQASent.pos.ans.tsv
     文件       62167  2015-08-25 12:30  WikiQACorpus\WikiQA-test.ref
     文件     1304776  2015-08-25 12:30  WikiQACorpus\WikiQA-test.tsv
     文件     1103628  2015-08-25 12:30  WikiQACorpus\WikiQA-test.txt
     文件       23699  2015-08-25 12:30  WikiQACorpus\WikiQA-test-filtered.ref
     文件      217757  2015-08-25 12:30  WikiQACorpus\WikiQA-train.ref
     文件     4358942  2015-08-25 12:30  WikiQACorpus\WikiQA-train.tsv
     文件     3671710  2015-08-25 12:30  WikiQACorpus\WikiQA-train.txt
     文件      251928  2016-07-13 16:57  WikiQACorpus\LICENSE.pdf
     文件        4680  2016-07-13 17:00  WikiQACorpus\README.txt

上一篇：Scratch课件
下一篇：Linux学习之CentOS带完整目录，非常适合初学者

共有条评论

wikiqa 数据集

资源简介

资源截图

代码片段和文件信息

评论

相关资源