論文

査読有り
2019年

Bi-Directional LSTM Networkを用いた発話に伴うジェスチャの自動生成手法

人工知能学会論文誌
  • 金子 直史
  • ,
  • 竹内 健太
  • ,
  • 長谷川 大
  • ,
  • 白川 真一
  • ,
  • 佐久田 博司
  • ,
  • 鷲見 和彦

34
6
開始ページ
C
終了ページ
J41_1-12
記述言語
日本語
掲載種別
研究論文(学術雑誌)
DOI
10.1527/tjsai.C-J41
出版者・発行元
一般社団法人 人工知能学会

<p>We present a novel framework for automatic speech-driven natural gesture motion generation. The proposed method consists of two steps. First, based on Bi-Directional LSTM Network, our deep network learns speech-gesture relationships with both forward and backward consistencies for a long period of time. The network regresses full 3D skeletal pose of a human from perceptual features extracted from the input audio in each time step. Second, we apply combined temporal filters to smooth out generated pose sequences. We utilize a speech-gesture dataset recorded with a headset and a marker-based motion capture to train our network. We evaluate different acoustic features, network architectures, and temporal filters in order to validate the effectiveness of the proposed approach. We also conduct a subjective evaluation and compare our approach against real human gestures. The subjective evaluation result shows that our generated gestures are comparable to "original" human gestures and are significantly better than "mismatched" human gestures taken from a different utterance in the scale of naturalness.</p>

リンク情報
DOI
https://doi.org/10.1527/tjsai.C-J41
CiNii Articles
http://ci.nii.ac.jp/naid/130007740807
URL
https://dblp.uni-trier.de/conf/iva/2018
URL
https://dblp.uni-trier.de/db/conf/iva/iva2018.html#HasegawaKSSS18

エクスポート
BibTeX RIS