論文

査読有り
2010年10月

Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING
  • Takashi Fukuda
  • ,
  • Osamu Ichikawa
  • ,
  • Masafumi Nishimura

4
5
開始ページ
834
終了ページ
844
記述言語
英語
掲載種別
研究論文(学術雑誌)
DOI
10.1109/JSTSP.2010.2069750
出版者・発行元
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. This paper proposes a statistical-model-based noise-robust VAD algorithm using long-term temporal information and harmonic-structure-based features in speech. Long-term temporal information has recently become an ASR focus, but has not yet been deeply investigated for VAD. In this paper, we first consider the temporal features in a cepstral domain calculated over the average phoneme duration. In contrast, the harmonic structures are well-known bearers of acoustic information in human voices, but that information is difficult to exploit statistically. This paper further describes a new method to exploit the harmonic structure information with statistical models, providing additional noise robustness. The proposed method including both the long-term temporal and the static harmonic features led to considerable improvements under low SNR conditions, with 77.7% error reduction on average as compared with the ETSI AFE-VAD in our VAD testing. In addition, the word error rate was reduced by 29.1% in a test that included a full ASR system.

リンク情報
DOI
https://doi.org/10.1109/JSTSP.2010.2069750
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000283266800008&DestApp=WOS_CPL
ID情報
  • DOI : 10.1109/JSTSP.2010.2069750
  • ISSN : 1932-4553
  • Web of Science ID : WOS:000283266800008

エクスポート
BibTeX RIS