論文

査読有り
2016年

Parameter Estimation of Multi-Objective Reinforcement Learning to Reach Arbitrary Pareto Solution

2016 IEEE INTERNATIONAL CONFERENCE ON AGENTS (IEEE ICA 2016)
  • Ryosuke Saitake
  • ,
  • Sachiyo Arai

開始ページ
110
終了ページ
111
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.1109/ICA.2016.17
出版者・発行元
IEEE

Multi-Objective Reinforcement Learning (MORL) can he divided into two approaches according to the number of acquired policies. One approach learns a single policy that makes the agent reach a single arbitral Pareto optimal solution, and the other approach learns multiple policies that correspond to each Pareto optimal solution. The latter approach finds the multiple policies simultaneously; however, it incurs significant computational cost. In many real-world cases, learning a single solution is sufficient in the multi-objective context. In this paper, we focus on the former approach where a suitable weight of each object must be defined. To estimate the weight of each object as parameters, we utilize Q-values on the expert's trajectory, which indicates the optimal sequence of actions. This approach is an analogy obtained from apprenticeship learning via inverse reinforcement learning. We evaluate the proposed method using a well-known MORL benchmark problem, i.e., the Deep Sea Treasure environment.

リンク情報
DOI
https://doi.org/10.1109/ICA.2016.17
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000404437200025&DestApp=WOS_CPL
ID情報
  • DOI : 10.1109/ICA.2016.17
  • Web of Science ID : WOS:000404437200025

エクスポート
BibTeX RIS