Parameter Estimation of Multi-Objective Reinforcement Learning to Reach Arbitrary Pareto Solution

2016 IEEE INTERNATIONAL CONFERENCE ON AGENTS (IEEE ICA 2016)

Ryosuke Saitake
Sachiyo Arai

開始ページ: 110
終了ページ: 111
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.1109/ICA.2016.17
出版者・発行元: IEEE

Multi-Objective Reinforcement Learning (MORL) can he divided into two approaches according to the number of acquired policies. One approach learns a single policy that makes the agent reach a single arbitral Pareto optimal solution, and the other approach learns multiple policies that correspond to each Pareto optimal solution. The latter approach finds the multiple policies simultaneously; however, it incurs significant computational cost. In many real-world cases, learning a single solution is sufficient in the multi-objective context. In this paper, we focus on the former approach where a suitable weight of each object must be defined. To estimate the weight of each object as parameters, we utilize Q-values on the expert's trajectory, which indicates the optimal sequence of actions. This approach is an analogy obtained from apprenticeship learning via inverse reinforcement learning. We evaluate the proposed method using a well-known MORL benchmark problem, i.e., the Deep Sea Treasure environment.

リンク情報

DOI: https://doi.org/10.1109/ICA.2016.17
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000404437200025&DestApp=WOS_CPL

ID情報

DOI : 10.1109/ICA.2016.17
Web of Science ID : WOS:000404437200025

エクスポート: BibTeX RIS

荒井幸代

論文

Parameter Estimation of Multi-Objective Reinforcement Learning to Reach Arbitrary Pareto Solution

メニュー

共著者の一覧