资源简介
We describe the WikiQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WikiQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WikiQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WikiQA dataset.

代码片段和文件信息
import os sys
def get_prf(fref fpred thre=0.1):
“““
get Q-A level and Q level precision recall fmeasure
“““
ref pred = [] []
qref qpred_idx = [] []
preqpos = ““
qflag = False
with open(fref “rb“) as f:
for line in f:
parts = line.strip().split()
qpos l = parts[0] int(parts[3])
if qpos != preqpos and preqpos != ““:
if qflag: qref.append(1)
else: qref.append(0)
qflag = False
preqpos = qpos
ref.append(l)
if l == 1: qflag = True
if qflag: qref.append(1)
else: qref.append(0)
preqpos = ““
maxval = 0.0
maxidx = -1
with open(fpred “rb“) as f:
for i line in enumerate(f.readlines()):
parts = line.strip().split()
qpos scr = parts[0] float(parts[4])
if qpos != preqpos and preqpos != ““:
qpred_idx.append(maxidx)
maxval = 0.0
maxidx = -1
preqpos = qpos
if scr >= thre: pred.append(1)
else: pred.append(0)
if scr > maxval:
maxidx = i
maxval = scr
qpred_idx.append(maxidx)
match_cnt ref_cnt pred_cnt = 0.0 0.0 0.0
for r p in zip(ref pred):
if r == 1: ref_cnt += 1.0
if p == 1: pred_cnt += 1.0
if r == 1 and p == 1: match_cnt += 1.0
prec reca = match_cnt / pred_cnt match_cnt / ref_cnt
match_cnt ref_cnt pred_cnt = 0.0 0.0 0.0
for r pidx in zip(qref qpred_idx):
if r == 1: ref_cnt += 1.0
if pred[pidx] >= thre: pred_cnt += 1.0
if r == 1 and pred[pidx] >= thre and ref[pidx] == 1: match_cnt += 1.0
qprec qreca = match_cnt / pred_cnt match_cnt / ref_cnt
qmatch_cnt qcnt = 0.0 0.0
for r pidx in zip(qref qpred_idx):
qcnt += 1.0
if r == 1 and pred[pidx] >= thre and ref[pidx] == 1: qmatch_cnt += 1.0
elif r == 0 and pred[pidx] < thre: qmatch_cnt += 1.0
qacc = qmatch_cnt / qcnt
return [prec reca 2.0*prec*reca/(prec+reca) qprec qreca 2.0*qprec*qreca/(qprec+qreca) qacc]
if __name__ == “__main__“:
refname predname = sys.argv[1] sys.argv[2]
thre = 0.11
if len(sys.argv) > 3:
thre = float(sys.argv[3])
results = get_prf(refname predname thre=thre)
print “WikiQA Question Triggering: precision = %.4f recall = %.4f F1 = %.4f“ %(results[3] results[4] results[5])
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
目录 0 2015-08-25 12:30 WikiQACorpus\emnlp-table\
文件 75927 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN.dev.rank
文件 172794 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN.test.rank
文件 56909 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN-Cnt.dev.rank
文件 129982 2015-08-25 12:30 WikiQACorpus\emnlp-table\WikiQA.CNN-Cnt.test.rank
文件 2526 2015-08-25 12:30 WikiQACorpus\eval.py
文件 791066 2013-11-15 12:51 WikiQACorpus\Guidelines_Phase1.pdf
文件 356867 2013-12-10 14:16 WikiQACorpus\Guidelines_Phase2.pdf
文件 6269938 2015-08-25 12:30 WikiQACorpus\WikiQA.tsv
文件 26846 2015-08-25 12:30 WikiQACorpus\WikiQA-dev.ref
文件 577103 2015-08-25 12:30 WikiQACorpus\WikiQA-dev.tsv
文件 483564 2015-08-25 12:30 WikiQACorpus\WikiQA-dev.txt
文件 11188 2015-08-25 12:30 WikiQACorpus\WikiQA-dev-filtered.ref
文件 541172 2015-08-25 15:06 WikiQACorpus\WikiQASent.pos.ans.tsv
文件 62167 2015-08-25 12:30 WikiQACorpus\WikiQA-test.ref
文件 1304776 2015-08-25 12:30 WikiQACorpus\WikiQA-test.tsv
文件 1103628 2015-08-25 12:30 WikiQACorpus\WikiQA-test.txt
文件 23699 2015-08-25 12:30 WikiQACorpus\WikiQA-test-filtered.ref
文件 217757 2015-08-25 12:30 WikiQACorpus\WikiQA-train.ref
文件 4358942 2015-08-25 12:30 WikiQACorpus\WikiQA-train.tsv
文件 3671710 2015-08-25 12:30 WikiQACorpus\WikiQA-train.txt
文件 251928 2016-07-13 16:57 WikiQACorpus\LICENSE.pdf
文件 4680 2016-07-13 17:00 WikiQACorpus\README.txt
- 上一篇:Scratch课件
- 下一篇:Linux学习之CentOS带完整目录,非常适合初学者
相关资源
- LCSTS高质量中文短文本摘要数据集
- 情感词极值表,台湾大学NTUSD简体中文
- 台湾大学NTUSD简体中文情感词典+知网
- 人民日报2014语料库(全)
- 中文维基百科语料库百度网盘网址.
- 哈工大深圳NLP考试参考
- 中文基础情感词典(NTUSD/HowNet/Tsingh
- 自然语言处理、文本挖掘论文40篇 包
- people_daily_2014_corpus.zip
- 中文垃圾短信数据集NLP
- ChineseGLUE_lcqmc.zip
- NLPCC2014 微博情感分析样例数据
- 2016年国科大NLP自然语言处理期末考试
- 来自于NLPCC2013,解析成txt文件 不均衡
- Deep Learning for NLP and Speech Recognition.p
- 基于深度学习的自然语言处理 英文版
- 中文文本情感分类已标注
- 用RNN与LSTM网络原理进行唐诗生成.ta
- Chinese NER data MSRA 中文命名实体识别语
- Bert-Chinese-Text-Classification-Pytorch-maste
- THE INNER WORKINGS OF WORD2VEC
- 蚂蚁金服文本匹配竞赛训练数据
- Reinforcement Learning in Natural Language Pro
- 自然语言处理技术
- 哈工大 同义词词林
- NLPCC2014情感分类语料集+已经标注好
- 已预处理 NLP 英文语料库 新闻组 20
- 中文NLP命名实体识别序列标注工具Y
- Natural Language Processing with PyTorch - 201
- tf-idf算法.zip
评论
共有 条评论