2010年11月30日

A Method for Determining the Timing of Displaying the Speaker's Face and Captions for a Real-Time Speech-to-Caption System

JCMSI : SICE journal of control, measurement, and system integration (SICE JCMSI)

KUROKI Hayato
INO Shuichi
NAKANO Satoko
HORI Kotaro
IFUKUBE Tohru

巻: 3
号: 6
開始ページ: 402
終了ページ: 408
記述言語: 英語
掲載種別
DOI: 10.9746/jcmsi.3.402
出版者・発行元: 公益社団法人計測自動制御学会

The authors of this paper have been studying a real-time speech-to-caption system using speech recognition technology with a “repeat-speaking” method. In this system, they used a “repeat-speaker” who listens to a lecturer's voice and then speaks back the lecturer's speech utterances into a speech recognition computer. The througoing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures, and face and mouth movements. So the authors found the idea to display information of captions and speaker's face movement images with a suitable way to achieve a higher comprehension after storing information once into a computer briefly. In this paper, we investigate the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results show that the sequence “to display the caption before the speaker's face image” improves the comprehension of the captions. The sequence “to display both simultaneously” shows an improvement only a few percent higher than the question sentence, and the sequence “to display the speaker's face image before the caption” shows almost no change. In addition, the sequence “to display the caption 1 second before the speaker's face shows the most significant improvement of all the conditions.

リンク情報

DOI: https://doi.org/10.9746/jcmsi.3.402
CiNii Articles: http://ci.nii.ac.jp/naid/10031140034
CiNii Books: http://ci.nii.ac.jp/ncid/AA12293218
URL: https://jlc.jst.go.jp/DN/JALC/00362070277?from=CiNii

ID情報

DOI : 10.9746/jcmsi.3.402
ISSN : 1882-4889
CiNii Articles ID : 10031140034
CiNii Books ID : AA12293218

エクスポート: BibTeX RIS

黒木速人

論文

A Method for Determining the Timing of Displaying the Speaker's Face and Captions for a Real-Time Speech-to-Caption System

メニュー

共著者の一覧