2016年
Parameter Estimation of Multi-Objective Reinforcement Learning to Reach Arbitrary Pareto Solution
2016 IEEE INTERNATIONAL CONFERENCE ON AGENTS (IEEE ICA 2016)
- ,
- 開始ページ
- 110
- 終了ページ
- 111
- 記述言語
- 英語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- DOI
- 10.1109/ICA.2016.17
- 出版者・発行元
- IEEE
Multi-Objective Reinforcement Learning (MORL) can he divided into two approaches according to the number of acquired policies. One approach learns a single policy that makes the agent reach a single arbitral Pareto optimal solution, and the other approach learns multiple policies that correspond to each Pareto optimal solution. The latter approach finds the multiple policies simultaneously; however, it incurs significant computational cost. In many real-world cases, learning a single solution is sufficient in the multi-objective context. In this paper, we focus on the former approach where a suitable weight of each object must be defined. To estimate the weight of each object as parameters, we utilize Q-values on the expert's trajectory, which indicates the optimal sequence of actions. This approach is an analogy obtained from apprenticeship learning via inverse reinforcement learning. We evaluate the proposed method using a well-known MORL benchmark problem, i.e., the Deep Sea Treasure environment.
- リンク情報
- ID情報
-
- DOI : 10.1109/ICA.2016.17
- Web of Science ID : WOS:000404437200025