論文

査読有り
2012年

LOW-LATENCY SPEAKER DIARIZATION BASED ON BAYESIAN INFORMATION CRITERION WITH MULTIPLE PHONEME CLASSES

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
  • Takahiro Oku
  • ,
  • Shoei Sato
  • ,
  • Akio Kobayashi
  • ,
  • Shinichi Homma
  • ,
  • Toru Imai

開始ページ
4189
終了ページ
4192
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1109/ICASSP.2012.6288842
出版者・発行元
IEEE

Low-latency speaker diarization is desirable for online-oriented speaker adaptation in real-time speech recognition. Especially in spontaneous conversations, several speakers tend to speak alternatively and continuously without any silence in between utterances. We therefore propose a speaker diarization method that detects speaker-change points and determines the speaker with a fixed low latency on the basis of a Bayesian information criterion (BIC) by using acoustic features classified into multiple phoneme classes. To improve the accuracy of speaker diarization in the low latency condition, the speaker-decision is made continuously at each phoneme boundary. In an experiment on conversational broadcast news programs, our diarization method reduced the speaker diarization error rate relatively by 20.0% compared to the conventional BIC with a single phoneme class. The online speaker adaptation applied in a speech-recognition experiment reduced word error rate at speaker-change points relatively by 7.8%.

リンク情報
DOI
https://doi.org/10.1109/ICASSP.2012.6288842
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000312381404065&DestApp=WOS_CPL
ID情報
  • DOI : 10.1109/ICASSP.2012.6288842
  • ISSN : 1520-6149
  • Web of Science ID : WOS:000312381404065

エクスポート
BibTeX RIS