1999年5月

On the relative importance of various components of the modulation spectrum for automatic speech recognition

SPEECH COMMUNICATION

N Kanedera
T Arai
H Hermansky
M Pavel

巻: 28
号: 1
開始ページ: 43
終了ページ: 55
記述言語: 英語
掲載種別
DOI: 10.1016/S0167-6393(99)00002-3
出版者・発行元: ELSEVIER SCIENCE BV

We measured the accuracy of speech recognition as a function of band-pass filtering of the time trajectories of spectral envelopes. We examined (i) several types of recognizers such as dynamic time warping (DTW) and hidden Markov model (HMM), and (ii) several types of features, such as filter bank output, mel-frequency cepstral coefficients (MFCC), and perceptual linear predictive (PLP) coefficients. We used the resulting recognition data to determine the relative importance of information in different modulation spectral components of speech for automatic speech recognition. We concluded that: (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components from the range below 2 Hz or above 16 Hz can degrade the recognition accuracy. (C) 1999 Elsevier Science B.V. All rights reserved.

リンク情報

DOI: https://doi.org/10.1016/S0167-6393(99)00002-3
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000080669300004&DestApp=WOS_CPL

ID情報

DOI : 10.1016/S0167-6393(99)00002-3
ISSN : 0167-6393
eISSN : 1872-7182
Web of Science ID : WOS:000080669300004

エクスポート: BibTeX RIS

金寺登

MISC

On the relative importance of various components of the modulation spectrum for automatic speech recognition

メニュー