講演・口頭発表等

2021年3月12日

Text to speech system for low resource languages by cross-lingual transfer learning and data augmentation

日本音響学会研究発表会講演論文集
  • ZOLZAYA BYAMBADORJ
  • ,
  • 西村 良太
  • ,
  • Altangerel Ayush
  • ,
  • 太田 健吾
  • ,
  • 北岡 教英

記述言語
英語
会議種別

In this paper we proposed various TTS systems which contain both a spectrogram prediction network and a neural vocoder, for use when only a small amount of target data is available. We trained some models using only transfer learning and some using only data augmentation, to test how each method affected the naturalness of the output of the TTS model. However, we found that training the TTS model using both methods improved performance, reducing the gap between our low-resource model and the baseline M-MN model, which was trained with a larger amount of original target speech data. We also trained the Parallel WaveGAN vocoder using the same augmented data. As a result, our proposed method achieved almost the same speech quality as the vocoder trained with the entire corpus of target language data.

リンク情報
URL
https://web.db.tokushima-u.ac.jp/cgi-bin/edb_browse?EID=374237