2017年
Policy Gradient Reinforcement Learning Method for Discrete-Time Linear Quadratic Regulation Problem Using Estimated State Value Function
2017 56TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE)
- ,
- ,
- ,
- ,
- ,
- 開始ページ
- 653
- 終了ページ
- 657
- 記述言語
- 英語
- 掲載種別
- 研究論文(国際会議プロシーディングス)
- DOI
- 10.23919/SICE.2017.8105539
- 出版者・発行元
- IEEE
In this paper, we propose a policy gradient reinforcement learning method which directly estimates the gradient of the state value function (V-function) with respect to a feedback coefficient matrix using measurable data and uses it for policy improvement. The proposed method can be applicable to the case where the state-action value function (Qfunction) is difficult to estimate, and can update the policy in an effective direction for reducing the accumulated cost.
- リンク情報
- ID情報
-
- DOI : 10.23919/SICE.2017.8105539
- Web of Science ID : WOS:000418323700151