A comparative study of dictionaries and corpora as methods for language resource addition

LANGUAGE RESOURCES AND EVALUATION

Shinsuke Mori
Graham Neubig

巻: 50
号: 2
開始ページ: 245
終了ページ: 261
記述言語: 英語
掲載種別: 研究論文（学術雑誌）
DOI: 10.1007/s10579-016-9354-7
出版者・発行元: SPRINGER

In this paper, we investigate the relative effect of two strategies for language resource addition for Japanese morphological analysis, a joint task of word segmentation and part-of-speech tagging. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that addition of annotated sentences to the training corpus is better than the addition of entries to the dictionary. In particular, adding annotated sentences is especially efficient when we add new words with contexts of several real occurrences as partially annotated sentences, i.e. sentences in which only some words are annotated with word boundary information. According to this knowledge, we performed real annotation experiments on invention disclosure texts and observed word segmentation accuracy. Finally we investigated various language resource addition cases and introduced the notion of non-maleficence, asymmetricity, and additivity of language resources for a task. In the WS case, we found that language resource addition is non-maleficent (adding new resources causes no harm in other domains) and sometimes additive (adding new resources helps other domains). We conclude that it is reasonable for us, NLP tool providers, to distribute only one general-domain model trained from all the language resources we have.

リンク情報

DOI: https://doi.org/10.1007/s10579-016-9354-7
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000377898300004&DestApp=WOS_CPL

ID情報

DOI : 10.1007/s10579-016-9354-7
ISSN : 1574-020X
eISSN : 1574-0218
Web of Science ID : WOS:000377898300004

エクスポート: BibTeX RIS

森信介

論文

A comparative study of dictionaries and corpora as methods for language resource addition

メニュー

共著者の一覧