論文

査読有り
2019年10月

Emotional Voice Conversion Using Dual Supervised Adversarial Networks With Continuous Wavelet Transform F0 Features

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
  • Zhaojie Luo
  • ,
  • Jinhui Chen
  • ,
  • Tetsuya Takiguchi
  • ,
  • Yasuo Ariki

27
10
開始ページ
1535
終了ページ
1548
記述言語
英語
掲載種別
研究論文(学術雑誌)
DOI
10.1109/TASLP.2019.2923951
出版者・発行元
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

In emotional voice conversion (VC) tasks, it is difficult to deal with a simple representation of fundamental frequency (F0), which is the most important feature in emotional voice representation. In order to address this issue, we propose the adaptive scales continuous wavelet transform (ADS-CWT) method to systematically capture F0 features of different temporal levels, which can represent different prosodic aspects, ranging from micro-prosody to sentences. Moreover, in an emotional VC task, each dataset is paired with the labeled emotional voice and neutral voice, which can be regarded as a dual task. Owing to, first, dual supervised learning's ability to improve the training performances by using the leveraging probabilistic connection between the dual tasks to enhance the learning from labeled data and, second, generative adversarial networks' (GANs') ability to mitigate the over-smoothing problem caused in the low-level data space when converting the acoustic features, we further present a novel training framework for emotional VC using GANs combined with dual supervised learning, named as dual supervised adversarial networks. In emotional VC experiments, we confirmed the high similarity performance of our method when using limited labeled data for emotional VC. Our method achieves good and consistent performance, in both objective and subjective evaluations.

リンク情報
DOI
https://doi.org/10.1109/TASLP.2019.2923951
DBLP
https://dblp.uni-trier.de/rec/journals/taslp/LuoCTA19
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000473621000004&DestApp=WOS_CPL
Dblp Url
https://dblp.uni-trier.de/db/journals/taslp/taslp27.html#LuoCTA19
URL
https://publons.com/wos-op/publon/36111242/
ID情報
  • DOI : 10.1109/TASLP.2019.2923951
  • ISSN : 2329-9290
  • eISSN : 2329-9304
  • DBLP ID : journals/taslp/LuoCTA19
  • ORCIDのPut Code : 135826040
  • Web of Science ID : WOS:000473621000004

エクスポート
BibTeX RIS