On Sparsity of Speech Features with Ladder Autoencoders for Multi-Speaker Separation

Proceedings - 2020 4th International Conference on Imaging, Signal Processing and Communications, ICISPC 2020

Hiroshi Sekiguchi
Yoshiaki Narusue
Hiroyuki Morikawa

開始ページ: 39
終了ページ: 43
記述言語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.1109/ICISPC51671.2020.00015

The multi-speaker separation mechanism consists of speech feature extraction and temporal coherence. In this study, a speech feature extraction is developed, and the reconstructed-speech quality is evaluated with different degrees of sparsity. Speech feature extraction is implemented on ladder autoencoders with branches embodying a sparse encoder-decoder model where the autoencoders are trained with the WSJ0-2mix English Corpus. An evaluation indicates the stability of the reconstructed-speech quality, with a signal-to-distortion ratio of >5 dB in the sparseness range of 0.4-0.7. The results suggest the applicability of the feature extraction method to the investigation of temporal coherence.

リンク情報

DOI: https://doi.org/10.1109/ICISPC51671.2020.00015
Scopus: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85107808658&origin=inward
Scopus Citedby: https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85107808658&origin=inward

ID情報

DOI : 10.1109/ICISPC51671.2020.00015
SCOPUS ID : 85107808658

エクスポート: BibTeX RIS

成末義哲

論文

On Sparsity of Speech Features with Ladder Autoencoders for Multi-Speaker Separation

メニュー

共著者の一覧