2005年11月
Effects of speaker normalization based on vocal tract length ratios on word recognition using compound parameters
Systems and Computers in Japan
- ,
- ,
- ,
- ,
- 巻
- 36
- 号
- 12
- 開始ページ
- 51
- 終了ページ
- 62
- 記述言語
- 英語
- 掲載種別
- 研究論文(学術雑誌)
- DOI
- 10.1002/scj.20339
This paper describes effectiveness in applying speaker normalization based on a vocal tract length ratio between two speakers to spoken word recognition. One of the two speakers is a speaker who utters unknown words to be recognized and the other is a standard speaker. The vocal tract length ratio between them is estimated, by using the method we proposed previously, from formant trajectories of the same words uttered by them. Speech parameters of the speaker for recognition are normalized into those of the standard speaker's vocal tract length by the estimated ratio. Speech recognition system in this research is featured by making use of compound parameters. When recognizing words uttered by diverse speakers in terms of age and sex using a phoneme template of a mixed speaker set (adults and children), the recognition rates after normalization are somewhat higher than those using advantageous templates constructed from the respective sets of adults and children without normalization. The same tendency is observed for both a single parameter and compound parameters. Thus, it is verified that the proposed normalization method is effective in recognition when speakers are unknown in age and sex. In addition, it is seen that the use of compound parameters is very effective regardless of whether or not vocal tract length ratio normalization is applied. When the IPA 5000-word dictionary is used, the recognition rate is improved by approximately 7% or more by the use of compound parameters compared to the case of a single parameter. © 2005 Wiley Periodicals, Inc.
- リンク情報
- ID情報
-
- DOI : 10.1002/scj.20339
- ISSN : 0882-1666
- J-Global ID : 201502843105883664
- SCOPUS ID : 27144479874