2007年
PLSA-based Topic Detection in Meetings for Adaptation of Lexicon and Language Model
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4
- ,
- ,
- 開始ページ
- 1321
- 終了ページ
- 1324
- 記述言語
- 英語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- 出版者・発行元
- ISCA-INST SPEECH COMMUNICATION ASSOC
A topic detection approach based on a probabilistic framework is proposed to realize topic adaptation of speech recognition systems for long speech archives such as meetings. Since topics in such speech are not clearly defined unlike news stories, we adopt a probabilistic representation of topics based on probabilistic latent semantic analysis (PLSA). A topical sub-space is constructed by PLSA, and speech segments are projected to the sub-space, then each segment is represented by a vector which consists of topic probabilities obtained by the projection. Topic detection is performed by clustering these vectors, and topic adaptation is done by collecting relevant texts based on the similarity in this probabilistic representation. In experimental evaluations, the proposed approach demonstrated significant reduction of perplexity and out-of-vocabulary rates as well as robustness against ASR errors.
- リンク情報
- ID情報
-
- Web of Science ID : WOS:000269998600331