2016年
Exploring OOV Words from Myanmar Text Using Maximal Substrings
PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016
- ,
- 開始ページ
- 657
- 終了ページ
- 663
- 記述言語
- 英語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- DOI
- 10.1109/IIAI-AAI.2016.73
- 出版者・発行元
- IEEE
This paper proposes a method for exploring out-of-vocabulary (OOV) words from Myanmar text by using maximal substrings. Our main purpose is to find OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words that do not exist in the Myanmar dictionary. Our method consists of two steps. In the first step, we extract maximal substrings, i.e., the substrings whose number of occurrences are decreased only after appending a character before or after them, from Myanmar news articles. In the second step, we make the post processing of maximal substrings, because the results obtained by maximal substrings contain noisy characters. Our post-processing is threefold. First, we reduce the number of maximal substrings. Second, we remove maximal substrings whose prefixes and suffixes are meaningless characters. Third, we find OOV words that are the substrings consisting of the two words from the existing dictionary. Consequently, we obtain the substrings as candidates of new compound words that can be inserted into the existing Myanmar dictionary after being scrutinized by native speakers. We evaluate the accuracy of new compound words by using the subjective perspective. It is found that our results do seem promising. We appeal that new compound words obtained by our method are useful for expressing the words as a single unit of meaning that can be utilized in Myanmar text effectively.
- リンク情報
-
- DOI
- https://doi.org/10.1109/IIAI-AAI.2016.73
- DBLP
- https://dblp.uni-trier.de/rec/conf/iiaiaai/WinM16
- Web of Science
- https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000389501300128&DestApp=WOS_CPL
- URL
- http://doi.ieeecomputersociety.org/10.1109/IIAI-AAI.2016.73
- URL
- http://dblp.uni-trier.de/db/conf/iiaiaai/iiaiaai2016.html#conf/iiaiaai/WinM16
- ID情報
-
- DOI : 10.1109/IIAI-AAI.2016.73
- DBLP ID : conf/iiaiaai/WinM16
- Web of Science ID : WOS:000389501300128