Neural networks and differential dynamic programming for reinforcement learning problems

Proceedings - IEEE International Conference on Robotics and Automation

Akihiko Yamaguchi
Christopher G. Atkeson

巻: 2016-June
号
開始ページ: 5434
終了ページ: 5441
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.1109/ICRA.2016.7487755
出版者・発行元: IEEE

© 2016 IEEE. We explore a model-based approach to reinforcement learning where partially or totally unknown dynamics are learned and explicit planning is performed. We learn dynamics with neural networks, and plan behaviors with differential dynamic programming (DDP). In order to handle complicated dynamics, such as manipulating liquids (pouring), we consider temporally decomposed dynamics. We start from our recent work [1] where we used locally weighted regression (LWR) to model dynamics. The major contribution of this paper is making use of deep learning in the form of neural networks with stochastic DDP, and showing the advantages of neural networks over LWR. For this purpose, we extend neural networks for: (1) modeling prediction error and output noise, (2) computing an output probability distribution for a given input distribution, and (3) computing gradients of output expectation with respect to an input. Since neural networks have nonlinear activation functions, these extensions were not easy. We provide an analytic solution for these extensions using some simplifying assumptions. We verified this method in pouring simulation experiments. The learning performance with neural networks was better than that of LWR. The amount of spilled materials was reduced. We also present early results of robot experiments using a PR2. Accompanying video: https://youtu.be/aM3hE1J5W98

リンク情報

DOI: https://doi.org/10.1109/ICRA.2016.7487755
DBLP: https://dblp.uni-trier.de/rec/conf/icra/YamaguchiA16
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000389516204091&DestApp=WOS_CPL
Scopus: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84977504694&origin=inward
Scopus Citedby: https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=84977504694&origin=inward
URL: http://dblp.uni-trier.de/db/conf/icra/icra2016.html#conf/icra/YamaguchiA16

ID情報

DOI : 10.1109/ICRA.2016.7487755
ISSN : 1050-4729
DBLP ID : conf/icra/YamaguchiA16
ORCIDのPut Code : 25903277
SCOPUS ID : 84977504694
Web of Science ID : WOS:000389516204091
ORCIDで取得されたその他外部ID : a:1:{i:0;a:1:{s:0:"";s:0:"";}}

エクスポート: BibTeX RIS

山口明彦

論文

Neural networks and differential dynamic programming for reinforcement learning problems

メニュー

共著者の一覧