論文

査読有り
2017年

Combined multi-channel NMF-based robust beamforming for noisy speech recognition

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
  • Masato Mimura
  • ,
  • Yoshiaki Bando
  • ,
  • Kazuki Shimada
  • ,
  • Shinsuke Sakai
  • ,
  • Kazuyoshi Yoshii
  • ,
  • Tatsuya Kawahara

2017-
開始ページ
2451
終了ページ
2455
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.21437/Interspeech.2017-642
出版者・発行元
International Speech Communication Association

We propose a novel acoustic beamforming method using blind source separation (BSS) techniques based on non-negative matrix factorization (NMF). In conventional mask-based ap- proaches, hard or soft masks are estimated and beamforming is performed using speech and noise spatial covariance matri- ces calculated from masked noisy observations, but the phase information of the target speech is not adequately preserved. In the proposed method, we perform complex-domain source sep- aration based on multi-channel NMF with rank-1 spatial model (rank-1 MNMF) to obtain a speech spatial covariance matrix for estimating a steering vector for the target speech utilizing the separated speech observation in each time-frequency bin. This accurate steering vector estimation is effectively combined with our novel noise mask prediction method using multi-channel robust NMF (MRNMF) to construct a Maximum Likelihood (ML) beamformer that achieved a better speech recognition per- formance than a state-of-the-art DNN-based beamformer with no environment-specific training. Superiority of the phase pre- serving source separation to real-valued masks in beamforming is also confirmed through ASR experiments.

リンク情報
DOI
https://doi.org/10.21437/Interspeech.2017-642
ID情報
  • DOI : 10.21437/Interspeech.2017-642
  • ISSN : 1990-9772
  • ISSN : 2308-457X
  • SCOPUS ID : 85039163147

エクスポート
BibTeX RIS