2021年3月12日

Text to speech system for low resource languages by cross-lingual transfer learning and data augmentation

日本音響学会研究発表会講演論文集

ZOLZAYA BYAMBADORJ
西村良太
Altangerel Ayush
太田健吾
北岡教英

記述言語: 英語
会議種別

In this paper we proposed various TTS systems which contain both a spectrogram prediction network and a neural vocoder, for use when only a small amount of target data is available. We trained some models using only transfer learning and some using only data augmentation, to test how each method affected the naturalness of the output of the TTS model. However, we found that training the TTS model using both methods improved performance, reducing the gap between our low-resource model and the baseline M-MN model, which was trained with a larger amount of original target speech data. We also trained the Parallel WaveGAN vocoder using the same augmented data. As a result, our proposed method achieved almost the same speech quality as the vocoder trained with the entire corpus of target language data.

リンク情報

URL: https://web.db.tokushima-u.ac.jp/cgi-bin/edb_browse?EID=374237

西村良太

講演・口頭発表等

Text to speech system for low resource languages by cross-lingual transfer learning and data augmentation

メニュー

共著者の一覧

フォロー一覧