Named entity recognizer trainable from partially annotated data

Communications in Computer and Information Science

Tetsuro Sasada
Shinsuke Mori
Tatsuya Kawahara
Yoko Yamakata

巻: 593
号
開始ページ: 148
終了ページ: 160
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.1007/978-981-10-0515-2_11
出版者・発行元: Springer Verlag

In this paper we propose a named entity recognizer (NER) which we can train from partially annotated data. As the natural language processing is getting to be applied to diverse texts, there arise high demands for the NER for new named entity (NE) definition in different domains. For these special NE definitions, only a small annotated corpus is available in the beginning, and a rapid and low-cost development of an NER is needed in practice. To satisfy the needs, we propose the use of partially annotated data, which is a set of sentences in which only a limited number of words are annotated with NE tags. Our NER method uses two-pass search for sequential labeling of NE tags: (1) enumerate NE tags with confidences for each word independently from the tags for other words and (2) the best NE tag sequence search referring to the tagconfidence pairs by CRFs. For the first-pass module, our method uses partially annotated data to improve the accuracy in the target domain. By this two-pass search framework, our method is expected to incorporate tag sequence statistics and to outperform state-of-the-art NERs based on a sequence labeling while keeping the high domain adaptability. We conducted several experiments comparing state-of-the-art NERs in various scenarios. The results showed that our method is effective both in the normal case and in adaptation cases.

リンク情報

DOI: https://doi.org/10.1007/978-981-10-0515-2_11

ID情報

DOI : 10.1007/978-981-10-0515-2_11
ISSN : 1865-0929
SCOPUS ID : 84961191752

エクスポート: BibTeX RIS

森信介

論文

Named entity recognizer trainable from partially annotated data

メニュー

共著者の一覧