Presentations

Mar 12, 2021

Text to speech system for low resource languages by cross-lingual transfer learning and data augmentation

日本音響学会研究発表会講演論文集
  • ZOLZAYA BYAMBADORJ
  • ,
  • Nishimura Ryota
  • ,
  • Altangerel Ayush
  • ,
  • 太田 健吾
  • ,
  • Kitaoka Norihide

Language
English
Presentation type

In this paper we proposed various TTS systems which contain both a spectrogram prediction network and a neural vocoder, for use when only a small amount of target data is available. We trained some models using only transfer learning and some using only data augmentation, to test how each method affected the naturalness of the output of the TTS model. However, we found that training the TTS model using both methods improved performance, reducing the gap between our low-resource model and the baseline M-MN model, which was trained with a larger amount of original target speech data. We also trained the Parallel WaveGAN vocoder using the same augmented data. As a result, our proposed method achieved almost the same speech quality as the vocoder trained with the entire corpus of target language data.

Link information
URL
https://web.db.tokushima-u.ac.jp/cgi-bin/edb_browse?EID=374237