2020年10月
On Sparsity of Speech Features with Ladder Autoencoders for Multi-Speaker Separation
Proceedings - 2020 4th International Conference on Imaging, Signal Processing and Communications, ICISPC 2020
- ,
- ,
- 開始ページ
- 39
- 終了ページ
- 43
- 記述言語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- DOI
- 10.1109/ICISPC51671.2020.00015
The multi-speaker separation mechanism consists of speech feature extraction and temporal coherence. In this study, a speech feature extraction is developed, and the reconstructed-speech quality is evaluated with different degrees of sparsity. Speech feature extraction is implemented on ladder autoencoders with branches embodying a sparse encoder-decoder model where the autoencoders are trained with the WSJ0-2mix English Corpus. An evaluation indicates the stability of the reconstructed-speech quality, with a signal-to-distortion ratio of >5 dB in the sparseness range of 0.4-0.7. The results suggest the applicability of the feature extraction method to the investigation of temporal coherence.
- リンク情報
- ID情報
-
- DOI : 10.1109/ICISPC51671.2020.00015
- SCOPUS ID : 85107808658