論文

査読有り
2020年10月

On Sparsity of Speech Features with Ladder Autoencoders for Multi-Speaker Separation

Proceedings - 2020 4th International Conference on Imaging, Signal Processing and Communications, ICISPC 2020
  • Hiroshi Sekiguchi
  • ,
  • Yoshiaki Narusue
  • ,
  • Hiroyuki Morikawa

開始ページ
39
終了ページ
43
記述言語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1109/ICISPC51671.2020.00015

The multi-speaker separation mechanism consists of speech feature extraction and temporal coherence. In this study, a speech feature extraction is developed, and the reconstructed-speech quality is evaluated with different degrees of sparsity. Speech feature extraction is implemented on ladder autoencoders with branches embodying a sparse encoder-decoder model where the autoencoders are trained with the WSJ0-2mix English Corpus. An evaluation indicates the stability of the reconstructed-speech quality, with a signal-to-distortion ratio of >5 dB in the sparseness range of 0.4-0.7. The results suggest the applicability of the feature extraction method to the investigation of temporal coherence.

リンク情報
DOI
https://doi.org/10.1109/ICISPC51671.2020.00015
Scopus
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85107808658&origin=inward
Scopus Citedby
https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85107808658&origin=inward
ID情報
  • DOI : 10.1109/ICISPC51671.2020.00015
  • SCOPUS ID : 85107808658

エクスポート
BibTeX RIS