論文

査読有り
2018年

GENERATING SOUND WORDS FROM AUDIO SIGNALS OF ACOUSTIC EVENTS WITH SEQUENCE-TO-SEQUENCE MODEL

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
  • Shota Ikawa
  • ,
  • Kunio Kashino

開始ページ
346
終了ページ
350
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1109/ICASSP.2018.8462034
出版者・発行元
IEEE

Representing various sounds in language, such as sound words, or onomatopoeias, is not only useful as an auxiliary means for automatic speech recognition, but also essential in emerging fields such as natural human-machine communication, searching audio archives for acoustic events, and abnormality detection based on sounds. This paper proposes a novel method for sound word generation from audio signals. The method is based on an end-to-end, sequence-to-sequence framework to solve the audio segmentation problem to find an appropriate segment of audio signals along time that corresponds to a sequence of phonemes, and the ambiguity problem, where multiple words may correspond to the same sound, depending on the situations or listeners. Our tests show that the method worked efficiently and achieved a 2.8 % mean phoneme error rate (MPER) and a 7.2 % word error rate (WER) in a sound word generation task.

リンク情報
DOI
https://doi.org/10.1109/ICASSP.2018.8462034
DBLP
https://dblp.uni-trier.de/rec/conf/icassp/IkawaK18
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000446384600069&DestApp=WOS_CPL
Dblp Cross Ref
https://dblp.uni-trier.de/conf/icassp/2018
Dblp Url
https://dblp.uni-trier.de/db/conf/icassp/icassp2018.html#IkawaK18
ID情報
  • DOI : 10.1109/ICASSP.2018.8462034
  • DBLP ID : conf/icassp/IkawaK18
  • Web of Science ID : WOS:000446384600069

エクスポート
BibTeX RIS