論文

査読有り
2011年6月

Generalized TD Learning

JOURNAL OF MACHINE LEARNING RESEARCH
  • Tsuyoshi Ueno
  • ,
  • Shin-ichi Maeda
  • ,
  • Motoaki Kawanabe
  • ,
  • Shin Ishii

12
開始ページ
1977
終了ページ
2020
記述言語
英語
掲載種別
研究論文(学術雑誌)
出版者・発行元
MICROTOME PUBL

Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new framework, semiparametric statistical inference, to model-free policy evaluation. This framework generalizes TD learning and its extensions, and allows us to investigate statistical properties of both of batch and online learning procedures for the value function estimation in a unified way in terms of estimating functions. Furthermore, based on this framework, we derive an optimal estimating function with the minimum asymptotic variance and propose batch and online learning algorithms which achieve the optimality.

Web of Science ® 被引用回数 : 5

リンク情報
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000293757200007&DestApp=WOS_CPL
URL
http://dl.acm.org/citation.cfm?id=2021063
URL
http://dblp.uni-trier.de/db/journals/jmlr/jmlr12.html#journals/jmlr/UenoMKI11
ID情報
  • ISSN : 1532-4435
  • DBLP ID : journals/jmlr/UenoMKI11
  • Web of Science ID : WOS:000293757200007

エクスポート
BibTeX RIS