論文

査読有り
2016年

Exploring OOV Words from Myanmar Text Using Maximal Substrings

PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016
  • Yuzana Win
  • ,
  • Tomonari Masada

開始ページ
657
終了ページ
663
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1109/IIAI-AAI.2016.73
出版者・発行元
IEEE

This paper proposes a method for exploring out-of-vocabulary (OOV) words from Myanmar text by using maximal substrings. Our main purpose is to find OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words that do not exist in the Myanmar dictionary. Our method consists of two steps. In the first step, we extract maximal substrings, i.e., the substrings whose number of occurrences are decreased only after appending a character before or after them, from Myanmar news articles. In the second step, we make the post processing of maximal substrings, because the results obtained by maximal substrings contain noisy characters. Our post-processing is threefold. First, we reduce the number of maximal substrings. Second, we remove maximal substrings whose prefixes and suffixes are meaningless characters. Third, we find OOV words that are the substrings consisting of the two words from the existing dictionary. Consequently, we obtain the substrings as candidates of new compound words that can be inserted into the existing Myanmar dictionary after being scrutinized by native speakers. We evaluate the accuracy of new compound words by using the subjective perspective. It is found that our results do seem promising. We appeal that new compound words obtained by our method are useful for expressing the words as a single unit of meaning that can be utilized in Myanmar text effectively.

リンク情報
DOI
https://doi.org/10.1109/IIAI-AAI.2016.73
DBLP
https://dblp.uni-trier.de/rec/conf/iiaiaai/WinM16
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000389501300128&DestApp=WOS_CPL
URL
http://doi.ieeecomputersociety.org/10.1109/IIAI-AAI.2016.73
URL
http://dblp.uni-trier.de/db/conf/iiaiaai/iiaiaai2016.html#conf/iiaiaai/WinM16
ID情報
  • DOI : 10.1109/IIAI-AAI.2016.73
  • DBLP ID : conf/iiaiaai/WinM16
  • Web of Science ID : WOS:000389501300128

エクスポート
BibTeX RIS