論文

査読有り
2021年12月

Distribution and characteristics of commonly used words across different texts in Japanese

Language and Text: Data, models, information and applications
  • Makoto Yamazaki

開始ページ
121
終了ページ
134
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1075/cilt.356.08yam

In this chapter, I survey the frequency distribution of commonly used words across different texts in Japanese. Using the Balanced Corpus of Contemporary Written Japanese, we examined the distribution. The results show the following. (1) The distribution draws a curve similar to Zipf’s law, but the curve always begins to increase shortly before the degree of commonality reaches its maximum, (2) neither the length nor the number of the texts affects the distribution trend, (3) as the text length increases, the number of commonly used words also increases linearly, but it reaches a maximum point due to the limited number of basic words.

リンク情報
DOI
https://doi.org/10.1075/cilt.356.08yam
ID情報
  • DOI : 10.1075/cilt.356.08yam

エクスポート
BibTeX RIS