2012年
LOW-LATENCY SPEAKER DIARIZATION BASED ON BAYESIAN INFORMATION CRITERION WITH MULTIPLE PHONEME CLASSES
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
- ,
- ,
- ,
- ,
- 開始ページ
- 4189
- 終了ページ
- 4192
- 記述言語
- 英語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- DOI
- 10.1109/ICASSP.2012.6288842
- 出版者・発行元
- IEEE
Low-latency speaker diarization is desirable for online-oriented speaker adaptation in real-time speech recognition. Especially in spontaneous conversations, several speakers tend to speak alternatively and continuously without any silence in between utterances. We therefore propose a speaker diarization method that detects speaker-change points and determines the speaker with a fixed low latency on the basis of a Bayesian information criterion (BIC) by using acoustic features classified into multiple phoneme classes. To improve the accuracy of speaker diarization in the low latency condition, the speaker-decision is made continuously at each phoneme boundary. In an experiment on conversational broadcast news programs, our diarization method reduced the speaker diarization error rate relatively by 20.0% compared to the conventional BIC with a single phoneme class. The online speaker adaptation applied in a speech-recognition experiment reduced word error rate at speaker-change points relatively by 7.8%.
- リンク情報
- ID情報
-
- DOI : 10.1109/ICASSP.2012.6288842
- ISSN : 1520-6149
- Web of Science ID : WOS:000312381404065