2022年9月
Applying Existing Datasets as a Pseudo Corpus for Sentiment Representation on Social Media
The 26th International Conference on Knowledge Based and Intelligent information and Engineering Systems
- ,
- ,
- ,
- ,
- ,
- 開始ページ
- 335
- 終了ページ
- 342
- 記述言語
- 英語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- DOI
- 10.1016/j.procs.2022.09.067
This paper proposes a method to represent the sentiment characteristics of opinions on social media by using some datasets as a pseudo corpus without any annotations. The widespread social media enables us to easily share our own opinions on the Web and communicate with each other. The more the data on social media increase, the more the demands for analysis of the data increase, e.g., text classification and sentiment analysis. Usually, the annotated data should be required in the existing text classification using the supervised machine learning method. However, it is reasonable to say that the criteria for the annotated labels should differ for each period, culture, and independent sense. Effective text classification for such different criteria needs the different types of annotations corresponding to each measure, and it requires much time and human resources. A pseudo corpus consists of multiple existing datasets with different characteristics in the proposed method. The classification model for each dataset is obtained as learning the pseudo corpus. The sentiment of the input text, which domain is different from the learned datasets, is represented as the likelihood distribution for the datasets in the pseudo corpus. This paper discusses the potential and limitations of this idea through the experiment.
- リンク情報
-
- DOI
- https://doi.org/10.1016/j.procs.2022.09.067 本文へのリンクあり
- ID情報
-
- DOI : 10.1016/j.procs.2022.09.067
- ISSN : 1877-0509