論文

2010年11月30日

A Method for Determining the Timing of Displaying the Speaker's Face and Captions for a Real-Time Speech-to-Caption System

JCMSI : SICE journal of control, measurement, and system integration (SICE JCMSI)
  • KUROKI Hayato
  • ,
  • INO Shuichi
  • ,
  • NAKANO Satoko
  • ,
  • HORI Kotaro
  • ,
  • IFUKUBE Tohru

3
6
開始ページ
402
終了ページ
408
記述言語
英語
掲載種別
DOI
10.9746/jcmsi.3.402
出版者・発行元
公益社団法人 計測自動制御学会

The authors of this paper have been studying a real-time speech-to-caption system using speech recognition technology with a “repeat-speaking” method. In this system, they used a “repeat-speaker” who listens to a lecturer's voice and then speaks back the lecturer's speech utterances into a speech recognition computer. The througoing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures, and face and mouth movements. So the authors found the idea to display information of captions and speaker's face movement images with a suitable way to achieve a higher comprehension after storing information once into a computer briefly. In this paper, we investigate the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results show that the sequence “to display the caption before the speaker's face image” improves the comprehension of the captions. The sequence “to display both simultaneously” shows an improvement only a few percent higher than the question sentence, and the sequence “to display the speaker's face image before the caption” shows almost no change. In addition, the sequence “to display the caption 1 second before the speaker's face shows the most significant improvement of all the conditions.

リンク情報
DOI
https://doi.org/10.9746/jcmsi.3.402
CiNii Articles
http://ci.nii.ac.jp/naid/10031140034
CiNii Books
http://ci.nii.ac.jp/ncid/AA12293218
URL
https://jlc.jst.go.jp/DN/JALC/00362070277?from=CiNii
ID情報
  • DOI : 10.9746/jcmsi.3.402
  • ISSN : 1882-4889
  • CiNii Articles ID : 10031140034
  • CiNii Books ID : AA12293218

エクスポート
BibTeX RIS