論文

査読有り 本文へのリンクあり
2022年9月

Applying Existing Datasets as a Pseudo Corpus for Sentiment Representation on Social Media

The 26th International Conference on Knowledge Based and Intelligent information and Engineering Systems
  • Ryosuke Yamanishi
  • ,
  • Hoshu Takemoto
  • ,
  • Yoko Nishihara
  • ,
  • Mitsuo Yoshida
  • ,
  • Tomoko Ohsuga
  • ,
  • Keizo Oyama

開始ページ
335
終了ページ
342
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1016/j.procs.2022.09.067

This paper proposes a method to represent the sentiment characteristics of opinions on social media by using some datasets as a pseudo corpus without any annotations. The widespread social media enables us to easily share our own opinions on the Web and communicate with each other. The more the data on social media increase, the more the demands for analysis of the data increase, e.g., text classification and sentiment analysis. Usually, the annotated data should be required in the existing text classification using the supervised machine learning method. However, it is reasonable to say that the criteria for the annotated labels should differ for each period, culture, and independent sense. Effective text classification for such different criteria needs the different types of annotations corresponding to each measure, and it requires much time and human resources. A pseudo corpus consists of multiple existing datasets with different characteristics in the proposed method. The classification model for each dataset is obtained as learning the pseudo corpus. The sentiment of the input text, which domain is different from the learned datasets, is represented as the likelihood distribution for the datasets in the pseudo corpus. This paper discusses the potential and limitations of this idea through the experiment.

リンク情報
DOI
https://doi.org/10.1016/j.procs.2022.09.067 本文へのリンクあり
ID情報
  • DOI : 10.1016/j.procs.2022.09.067
  • ISSN : 1877-0509

エクスポート
BibTeX RIS