2001年7月16日

タグなしコーパスによる形態素解析と仮名漢字変換の精度向上

情報処理学会研究報告. 自然言語処理研究会報告

森信介
伊東伸泰

巻: 2001
号: 69
開始ページ: 47
終了ページ: 54
記述言語: 日本語
掲載種別
出版者・発行元: 一般社団法人情報処理学会

確率的言語モデルを基礎とする自然言語処理において、タグが付与された学習コーパスは重要であり、これを増量することが精度向上につながることがわかっている。しかしながら有意な精度向上のためには、学習コーパスを指数関数的以上に増加させる必要があり、このために必要なコーパスにタグを付与するコストは無視できない程度になっている。このような背景のもと、本論文では、タグなしコーパスの利用による形態素解析と仮名漢字変換の精度向上について述べる。実験では、タグなしコーパスの利用により、確率的言語モデルの予測力やそれに基づく仮名漢字変換の精度は有意に向上し、タグなしコーパスは0.87倍の量のタグつきコーパスに匹敵したが、形態素解析の精度向上は微小であった。A tagged corpus plays an important role in natural language processing based on a stochastic language model and increasing the corpus size improves the accuracy. It is, however, necessary for a meaningful improvement to incerase a corpus size more than expornentially and an annotation cost needed for it is not negligiable. In this paper, we discuss the usage of an untagged corpus. In the expreriments, using an untagged corpus improved the predictive power of a stochastic language model and the accuracy of a kana-kanji converter based on it. But for a tagger the improvement was slight.

リンク情報

CiNii Articles: http://ci.nii.ac.jp/naid/110002935309
CiNii Books: http://ci.nii.ac.jp/ncid/AN10115061
URL: http://id.ndl.go.jp/bib/5866211
URL: http://id.nii.ac.jp/1001/00048494/

ID情報

ISSN : 0919-6072
CiNii Articles ID : 110002935309
CiNii Books ID : AN10115061

エクスポート: BibTeX RIS

森信介

MISC

タグなしコーパスによる形態素解析と仮名漢字変換の精度向上

メニュー

共著者の一覧