Policy Gradient Reinforcement Learning Method for Discrete-Time Linear Quadratic Regulation Problem Using Estimated State Value Function

2017 56TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE)

Tomotake Sasaki
Eiji Uchibe
Hidenao Iwane
Hitoshi Yanami
Hirokazu Anai
Kenji Doya

開始ページ: 653
終了ページ: 657
記述言語: 英語
掲載種別: 研究論文（国際会議プロシーディングス）
DOI: 10.23919/SICE.2017.8105539
出版者・発行元: IEEE

In this paper, we propose a policy gradient reinforcement learning method which directly estimates the gradient of the state value function (V-function) with respect to a feedback coefficient matrix using measurable data and uses it for policy improvement. The proposed method can be applicable to the case where the state-action value function (Qfunction) is difficult to estimate, and can update the policy in an effective direction for reducing the accumulated cost.

リンク情報

DOI: https://doi.org/10.23919/SICE.2017.8105539
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000418323700151&DestApp=WOS_CPL
URL: http://www.sice.or.jp/sice2017/

ID情報

DOI : 10.23919/SICE.2017.8105539
Web of Science ID : WOS:000418323700151

エクスポート: BibTeX RIS

岩根秀直

論文

Policy Gradient Reinforcement Learning Method for Discrete-Time Linear Quadratic Regulation Problem Using Estimated State Value Function

メニュー

共著者の一覧

フォロー一覧