論文

査読有り
2016年

Named entity recognizer trainable from partially annotated data

Communications in Computer and Information Science
  • Tetsuro Sasada
  • ,
  • Shinsuke Mori
  • ,
  • Tatsuya Kawahara
  • ,
  • Yoko Yamakata

593
開始ページ
148
終了ページ
160
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1007/978-981-10-0515-2_11
出版者・発行元
Springer Verlag

In this paper we propose a named entity recognizer (NER) which we can train from partially annotated data. As the natural language processing is getting to be applied to diverse texts, there arise high demands for the NER for new named entity (NE) definition in different domains. For these special NE definitions, only a small annotated corpus is available in the beginning, and a rapid and low-cost development of an NER is needed in practice. To satisfy the needs, we propose the use of partially annotated data, which is a set of sentences in which only a limited number of words are annotated with NE tags. Our NER method uses two-pass search for sequential labeling of NE tags: (1) enumerate NE tags with confidences for each word independently from the tags for other words and (2) the best NE tag sequence search referring to the tagconfidence pairs by CRFs. For the first-pass module, our method uses partially annotated data to improve the accuracy in the target domain. By this two-pass search framework, our method is expected to incorporate tag sequence statistics and to outperform state-of-the-art NERs based on a sequence labeling while keeping the high domain adaptability. We conducted several experiments comparing state-of-the-art NERs in various scenarios. The results showed that our method is effective both in the normal case and in adaptation cases.

リンク情報
DOI
https://doi.org/10.1007/978-981-10-0515-2_11
ID情報
  • DOI : 10.1007/978-981-10-0515-2_11
  • ISSN : 1865-0929
  • SCOPUS ID : 84961191752

エクスポート
BibTeX RIS