論文

査読有り
2017年

Policy Gradient Reinforcement Learning Method for Discrete-Time Linear Quadratic Regulation Problem Using Estimated State Value Function

2017 56TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE)
  • Tomotake Sasaki
  • ,
  • Eiji Uchibe
  • ,
  • Hidenao Iwane
  • ,
  • Hitoshi Yanami
  • ,
  • Hirokazu Anai
  • ,
  • Kenji Doya

開始ページ
653
終了ページ
657
記述言語
英語
掲載種別
研究論文(国際会議プロシーディングス)
DOI
10.23919/SICE.2017.8105539
出版者・発行元
IEEE

In this paper, we propose a policy gradient reinforcement learning method which directly estimates the gradient of the state value function (V-function) with respect to a feedback coefficient matrix using measurable data and uses it for policy improvement. The proposed method can be applicable to the case where the state-action value function (Qfunction) is difficult to estimate, and can update the policy in an effective direction for reducing the accumulated cost.

リンク情報
DOI
https://doi.org/10.23919/SICE.2017.8105539
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000418323700151&DestApp=WOS_CPL
URL
http://www.sice.or.jp/sice2017/
ID情報
  • DOI : 10.23919/SICE.2017.8105539
  • Web of Science ID : WOS:000418323700151

エクスポート
BibTeX RIS