Combined multi-channel NMF-based robust beamforming for noisy speech recognition

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Masato Mimura
Yoshiaki Bando
Kazuki Shimada
Shinsuke Sakai
Kazuyoshi Yoshii
Tatsuya Kawahara

巻: 2017-
号
開始ページ: 2451
終了ページ: 2455
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.21437/Interspeech.2017-642
出版者・発行元: International Speech Communication Association

We propose a novel acoustic beamforming method using blind source separation (BSS) techniques based on non-negative matrix factorization (NMF). In conventional mask-based ap- proaches, hard or soft masks are estimated and beamforming is performed using speech and noise spatial covariance matri- ces calculated from masked noisy observations, but the phase information of the target speech is not adequately preserved. In the proposed method, we perform complex-domain source sep- aration based on multi-channel NMF with rank-1 spatial model (rank-1 MNMF) to obtain a speech spatial covariance matrix for estimating a steering vector for the target speech utilizing the separated speech observation in each time-frequency bin. This accurate steering vector estimation is effectively combined with our novel noise mask prediction method using multi-channel robust NMF (MRNMF) to construct a Maximum Likelihood (ML) beamformer that achieved a better speech recognition per- formance than a state-of-the-art DNN-based beamformer with no environment-specific training. Superiority of the phase pre- serving source separation to real-valued masks in beamforming is also confirmed through ASR experiments.

リンク情報

DOI: https://doi.org/10.21437/Interspeech.2017-642

ID情報

DOI : 10.21437/Interspeech.2017-642
ISSN : 1990-9772
ISSN : 2308-457X
SCOPUS ID : 85039163147

エクスポート: BibTeX RIS

河原達也

論文

Combined multi-channel NMF-based robust beamforming for noisy speech recognition

メニュー

共著者の一覧