Sub-Band Text-to-Speech Combining Sample-Based Spectrum with Statistically Generated Spectrum

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5

Tadashi Inai
Sunao Hara
Masanobu Abe
Yusuke Ijima
Noboru Miyazaki
Hideyuki Mizuno

開始ページ: 264
終了ページ: 268
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
出版者・発行元: ISCA-INT SPEECH COMMUNICATION ASSOC

As described in this paper, we propose a sub-band speech synthesis approach to develop a high quality Text-to-Speech (TTS) system: a sample-based spectrum is used in the high-frequency band and spectrum generated by HMM-based TTS is used in the low-frequency band. Herein, sample-based spectrum means spectrum selected from a phoneme database such that it is the most similar to spectrum generated by HMM-based speech synthesis. A key idea is to compensate over-smoothing caused by statistical procedures by introducing a sample-based spectrum, especially in the high-frequency band. Listening test results show that the proposed method has better performance than HMM-based speech synthesis in terms of clarity. It is at the same level as HMM-based speech synthesis in terms of smoothness. In addition, preference test results among the proposed method, HMM-based speech synthesis, and waveform speech synthesis using 80 min speech data reveal that the proposed method is the most liked.

リンク情報

Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000380581600054&DestApp=WOS_CPL

ID情報

Web of Science ID : WOS:000380581600054

エクスポート: BibTeX RIS

阿部匡伸

論文

Sub-Band Text-to-Speech Combining Sample-Based Spectrum with Statistically Generated Spectrum

メニュー

共著者の一覧