Audio-visual scene understanding utilizing text information for a cooking support robot

2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS)

Ryosuke Kojima
Osamu Sugiyama
Kazuhiro Nakadai

開始ページ: 4210
終了ページ: 4215
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
出版者・発行元: IEEE

This paper addresses multimodal "scene understanding" for a robot using audio-visual and text information. Scene understanding is defined by extracting six-W information such as What, When, Where, Who, Why, and hoW on the surrounding environment. Although scene understanding for a robot has been studied in the fields of robot vision and audition, only the first four Ws except for why and how information were considered. We, thus, focus on extracting how information, in particular, on cooking scenes. In cooking scenes, we define how information as a cooking procedure, and it is useful that a robot gives appropriate advice for cooking. To realize such cooking support, we propose a multimodal cooking procedure recognition framework consisting of Convolutional Neural Network (CNN), and Hierarchical Hidden Markov Model (HHMM). CNN is knows as one of the most advanced classifiers, and it is applied to recognize a cooking events from audio and visual information. HHMM models a cooking procedure represented by a sequence of cooking events, which is defined as a relationship between cooking events using text data obtained from web, and the cooking events classified with CNN. Therefore, our proposed framework integrates these three types of modalities. We constructed an interactive cooking support system based on the proposed framework, which advice a next step in the current cooking procedure through human-robot communication. Preliminary results with simulated and real recorded multi-modal scenes showed the robustness of the proposed framework in a noisy and/or occluded situation.

リンク情報

Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000371885404059&DestApp=WOS_CPL
URL: http://honda-ri.jp/publications/1160

ID情報

ISSN : 2153-0858
Web of Science ID : WOS:000371885404059

エクスポート: BibTeX RIS

小島諒介

論文

Audio-visual scene understanding utilizing text information for a cooking support robot

メニュー

共著者の一覧