論文

査読有り
2012年

Voice activity detection in noise using modulation spectrum of speech: Investigation of speech frequency and modulation frequency ranges

Acoustical Science and Technology
  • Kimhuoch Pek
  • ,
  • Takayuki Arai
  • ,
  • Noboru Kanedera

33
1
開始ページ
33
終了ページ
44
記述言語
英語
掲載種別
研究論文(学術雑誌)
DOI
10.1250/ast.33.33

Voice activity detection (VAD) in noisy environments is a very important preprocessing scheme in speech communication technology, a field which includes speech recognition, speech coding, speech enhancement and captioning video contents. We have developed a VAD method for noisy environments based on the modulation spectrum. In Experiment 1, we investigate the optimal ranges of speech and modulation frequencies for the proposed algorithm by using the simulated data in the CENSREC-1-C corpus. Results show that when we combine an upper limit frequency between 1,000 and 2,000 Hz with a lower limit frequency of less than 300 Hz as speech frequency bands, error rates are lower than with other bands. Furthermore, when we use the frequency components of the modulation spectrum between 3-9, 3-11, 3-14, 3-18, 4-9, 4-11, 4-14, 4-18, 5-7, 5-9, 5-11, or 5- 14 Hz, the proposed method performs VAD well. In Experiment 2, we use one of the best parameter settings from Experiment 1 and evaluate the real environment data in the CENSREC-1-C corpus by comparing our method with other conventional methods. Improvements were observed from the VAD results for each SNR condition and noise type. © 2012 The Acoustical Society of Japan.

リンク情報
DOI
https://doi.org/10.1250/ast.33.33
ID情報
  • DOI : 10.1250/ast.33.33
  • ISSN : 1346-3969
  • ISSN : 1347-5177
  • SCOPUS ID : 84855323672

エクスポート
BibTeX RIS