资源简介
BIO标注集,即B-PER、I-PER代表人名首字、人名非首字,B-LOC、I-LOC代表地名首字、地名非首字,B-ORG、I-ORG代表组织机构名首字、组织机构名非首字,O代表该字不属于命名实体的一部分。
代码片段和文件信息
# Python version of the evaluation script from CoNLL‘00-
# Originates from: https://github.com/spyysalo/conlleval.py
# Intentional differences:
# - accept any space as delimiter by default
# - optional file argument (default STDIN)
# - option to set boundary (-b argument)
# - LaTeX output (-l argument) not supported
# - raw tags (-r argument) not supported
import sys
import re
import codecs
from collections import defaultdict namedtuple
ANY_SPACE = ‘‘
class FormatError(Exception):
pass
Metrics = namedtuple(‘Metrics‘ ‘tp fp fn prec rec fscore‘)
class EvalCounts(object):
def __init__(self):
self.correct_chunk = 0 # number of correctly identified chunks
self.correct_tags = 0 # number of correct chunk tags
self.found_correct = 0 # number of chunks in corpus
self.found_guessed = 0 # number of identified chunks
self.token_counter = 0 # token counter (ignores sentence breaks)
# counts by type
self.t_correct_chunk = defaultdict(int)
self.t_found_correct = defaultdict(int)
self.t_found_guessed = defaultdict(int)
def parse_args(argv):
import argparse
parser = argparse.ArgumentParser(
description=‘evaluate tagging results using CoNLL criteria‘
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
arg = parser.add_argument
arg(‘-b‘ ‘--boundary‘ metavar=‘STR‘ default=‘-X-‘
help=‘sentence boundary‘)
arg(‘-d‘ ‘--delimiter‘ metavar=‘CHAR‘ default=ANY_SPACE
help=‘character delimiting items in input‘)
arg(‘-o‘ ‘--otag‘ metavar=‘CHAR‘ default=‘O‘
help=‘alternative outside tag‘)
arg(‘file‘ nargs=‘?‘ default=None)
return parser.parse_args(argv)
def parse_tag(t):
m = re.match(r‘^([^-]*)-(.*)$‘ t)
return m.groups() if m else (t ‘‘)
def evaluate(iterable options=None):
if options is None:
options = parse_args([]) # use defaults
counts = EvalCounts()
num_features = None # number of features per line
in_correct = False # currently processed chunks is correct until now
last_correct = ‘O‘ # previous chunk tag in corpus
last_correct_type = ‘‘ # type of previously identified chunk tag
last_guessed = ‘O‘ # previously identified chunk tag
last_guessed_type = ‘‘ # type of previous chunk tag in corpus
for line in iterable:
line = line.rstrip(‘\r\n‘)
if options.delimiter == ANY_SPACE:
features = line.split()
else:
features = line.split(options.delimiter)
if num_features is None:
num_features = len(features)
elif num_features != len(features) and len(features) != 0:
raise FormatError(‘unexpected number of features: %d (%d)‘ %
(len(features) num_features))
if len(features) == 0 or features[0] == options.boundary:
features = [options.boundary
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 12728 2017-07-05 00:18 ChineseNER-master(来源联合数据)\conlleval
文件 10110 2017-07-05 00:18 ChineseNER-master(来源联合数据)\conlleval.py
文件 10110 2017-07-05 00:18 ChineseNER-master(来源联合数据)\data\conlleval.py
文件 1383712 2017-07-05 00:18 ChineseNER-master(来源联合数据)\data\example.dev
文件 1405788 2017-07-05 00:18 ChineseNER-master(来源联合数据)\data\example.test
文件 5596172 2017-07-05 00:18 ChineseNER-master(来源联合数据)\data\example.train
文件 8104 2017-07-05 00:18 ChineseNER-master(来源联合数据)\data_utils.py
文件 5782 2017-07-05 00:18 ChineseNER-master(来源联合数据)\loader.py
文件 8918 2017-07-05 00:18 ChineseNER-master(来源联合数据)\main.py
文件 11605 2017-07-05 00:18 ChineseNER-master(来源联合数据)\model.py
文件 1273 2017-07-05 00:18 ChineseNER-master(来源联合数据)\README.md
文件 9470 2017-07-05 00:18 ChineseNER-master(来源联合数据)\rnncell.py
文件 6038 2017-07-05 00:18 ChineseNER-master(来源联合数据)\utils.py
文件 15335492 2017-07-05 00:18 ChineseNER-master(来源联合数据)\wiki_100.utf8
目录 0 2018-08-06 17:18 ChineseNER-master(来源联合数据)\data
目录 0 2018-08-06 17:19 ChineseNER-master(来源联合数据)
----------- --------- ---------- ----- ----
23805302 16
- 上一篇:OpenGL实践三:水面涟漪的逼真绘制毕业设计
- 下一篇:广工计算机组成原理实验报告
相关资源
- AntConc-3.43
- 嘉立创 SMT贴片打样 可贴列表 Altium
- Ogre+3d+1.7+beginner+Guide 中英对照.pdf
- 已预处理 NLP 英文语料库 新闻组 20
- 中文歌词语料JOSN格式
- altium designer09入门教程PDF
- Data Analytics for Beginners: Basic Guide to M
-
Grobner基的经典书籍----Grobner ba
ses - 自己辛苦收集的altium designer原理元件
- generator.rar
- opencv-3.4.0编译失败需要的boostdesc_bgm
- powerdesigner12.5 汉化补丁
- Numerical Liner Algebra with Applications
- aiml中文语料
- 多屏共享鼠标和键盘软件 ——win64
- 安卓自定义spinner
- 遥感图像特征点提取moravec和fostner)
- 体育相关语料库
- BFSU PowerConc 1.0通用型语料库检索软件
- BFSU ParaConc
- IP-MAC Scanner 局域网IP-MAC扫描器 跨网段
- Altium Designer下PIC单片机的集成库
- PowerDesigner建模
- altium designer9破解包
- AltiumDesigner9.4破解补丁
- Allegro软件--PADS软件--Altium Designer软件
- Altium Designer10制作案例
- Altium designer PCB画板速成教材--郑总编
- 数据挖掘工具Rapidminer基础介绍
- 康奈尔电影对话语料库
评论
共有 条评论