资源简介
We describe the WikiQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WikiQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WikiQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WikiQA dataset.
代码片段和文件信息
import os sys
def get_prf(fref fpred thre=0.1):
“““
get Q-A level and Q level precision recall fmeasure
“““
ref pred = [] []
qref qpred_idx = [] []
preqpos = ““
qflag = False
with open(fref “rb“) as f:
for line in f:
parts = line.strip().split()
qpos l = parts[0] int(parts[3])
if qpos != preqpos and preqpos != ““:
if qflag: qref.append(1)
else: qref.append(0)
qflag = False
preqpos = qpos
ref.append(l)
if l == 1: qflag = True
if qflag: qref.append(1)
else: qref.append(0)
preqpos = ““
maxval = 0.0
maxidx = -1
with open(fpred “rb“) as f:
for i line in enumerate(f.readlines()):
parts = line.strip().split()
qpos scr = parts[0] float(parts[4])
if qpos != preqpos and preqpos != ““:
qpred_idx.append(maxidx)
maxval = 0.0
maxidx = -1
preqpos = qpos
if scr >= thre: pred.append(1)
else: pred.append(0)
if scr > maxval:
maxidx = i
maxval = scr
qpred_idx.append(maxidx)
match_cnt ref_cnt pred_cnt = 0.0 0.0 0.0
for r p in zip(ref pred):
if r == 1: ref_cnt += 1.0
if p == 1: pred_cnt += 1.0
if r == 1 and p == 1: match_cnt += 1.0
prec reca = match_cnt / pred_cnt match_cnt / ref_cnt
match_cnt ref_cnt pred_cnt = 0.0 0.0 0.0
for r pidx in zip(qref qpred_idx):
if r == 1: ref_cnt += 1.0
if pred[pidx] >= thre: pred_cnt += 1.0
if r == 1 and pred[pidx] >= thre and ref[pidx] == 1: match_cnt += 1.0
qprec qreca = match_cnt / pred_cnt match_cnt / ref_cnt
qmatch_cnt qcnt = 0.0 0.0
for r pidx in zip(qref qpred_idx):
qcnt += 1.0
if r == 1 and pred[pidx] >= thre and ref[pidx] == 1: qmatch_cnt += 1.0
elif r == 0 and pred[pidx] < thre: qmatch_cnt += 1.0
qacc = qmatch_cnt / qcnt
return [prec reca 2.0*prec*reca/(prec+reca) qprec qreca 2.0*qprec*qreca/(qprec+qreca) qacc]
if __name__ == “__main__“:
refname predname = sys.argv[1] sys.argv[2]
thre = 0.11
if len(sys.argv) > 3:
thre = float(sys.argv[3])
results = get_prf(refname predname thre=thre)
print “WikiQA Question Triggering: precision = %.4f recall = %.4f F1 = %.4f“ %(results[3] results[4] results[5])
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2015-08-25 12:30 WikiQACorpus\emnlp-table\
文件 75927 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN.dev.rank
文件 172794 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN.test.rank
文件 56909 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN-Cnt.dev.rank
文件 129982 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN-Cnt.test.rank
文件 2526 2015-08-25 12:30 WikiQACorpus\eval.py
文件 791066 2013-11-15 12:51 WikiQACorpus\Guidelines_Phase1.pdf
文件 356867 2013-12-10 14:16 WikiQACorpus\Guidelines_Phase2.pdf
文件 6269938 2015-08-25 12:30 WikiQACorpus\WikiQA.tsv
文件 26846 2015-08-25 12:30 WikiQACorpus\WikiQA-dev.ref
文件 577103 2015-08-25 12:30 WikiQACorpus\WikiQA-dev.tsv
文件 483564 2015-08-25 12:30 WikiQACorpus\WikiQA-dev.txt
文件 11188 2015-08-25 12:30 WikiQACorpus\WikiQA-dev-filtered.ref
文件 541172 2015-08-25 15:06 WikiQACorpus\WikiQASent.pos.ans.tsv
文件 62167 2015-08-25 12:30 WikiQACorpus\WikiQA-test.ref
文件 1304776 2015-08-25 12:30 WikiQACorpus\WikiQA-test.tsv
文件 1103628 2015-08-25 12:30 WikiQACorpus\WikiQA-test.txt
文件 23699 2015-08-25 12:30 WikiQACorpus\WikiQA-test-filtered.ref
文件 217757 2015-08-25 12:30 WikiQACorpus\WikiQA-train.ref
文件 4358942 2015-08-25 12:30 WikiQACorpus\WikiQA-train.tsv
文件 3671710 2015-08-25 12:30 WikiQACorpus\WikiQA-train.txt
文件 251928 2016-07-13 16:57 WikiQACorpus\LICENSE.pdf
文件 4680 2016-07-13 17:00 WikiQACorpus\README.txt
- 上一篇:Scratch课件
- 下一篇:Linux学习之CentOS带完整目录,非常适合初学者
相关资源
- 蚂蚁金服文本匹配竞赛训练数据
- Reinforcement Learning in Natural Language Pro
- 自然语言处理技术
- 哈工大 同义词词林
- NLPCC2014情感分类语料集+已经标注好
- 已预处理 NLP 英文语料库 新闻组 20
- 中文NLP命名实体识别序列标注工具Y
- Natural Language Processing with PyTorch - 201
- tf-idf算法.zip
- BosonNLP数据的情感词典
- 中文实体词典(NLP必备)
- NLPCC2013评估任务_中文微博观点要素抽
- NLPCC2014评估任务2_基于深度学习的情感
- webrtc中NLP处理
- Neural Network Methods in Natural Language Pro
- 豆瓣5万条影评数据集
- NLPIR分词、去停用词
- zw_NlPIR.zip
- hanlp-1.7.2-release.zip
- hanlp-1.7.8-release.zip
- 自然语言处理理论与实战
- 文本挖掘课程PDF
- NLPCC2018论文集
- 50W聊天语料训练数据.zip
- Bi-LSTM_CRF_NER.rar
- 20190712-面向自然语言处理的深度学习
- 北京大学自然语言处理导论课件
- 自然语言处理综论.pdf
- 陈丹琦博士毕业论文,机器阅读理解
- NER语料集.zip
评论
共有 条评论