Papers

Peer-reviewed
2009

Optimal Online Learning Procedures for Model-Free Policy Evaluation

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II
  • Tsuyoshi Ueno
  • ,
  • Shin-ichi Maeda
  • ,
  • Motoaki Kawanabe
  • ,
  • Shin Ishii

Volume
5782
Number
First page
473
Last page
+
Language
English
Publishing type
Research paper (international conference proceedings)
DOI
10.1007/978-3-642-04174-7_31
Publisher
SPRINGER-VERLAG BERLIN

In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem.

Link information
DOI
https://doi.org/10.1007/978-3-642-04174-7_31
DBLP
https://dblp.uni-trier.de/rec/conf/pkdd/UenoMKI09
Web of Science
https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000272076400031&DestApp=WOS_CPL
URL
http://dblp.uni-trier.de/db/conf/pkdd/pkdd2009-2.html#conf/pkdd/UenoMKI09
ID information
  • DOI : 10.1007/978-3-642-04174-7_31
  • ISSN : 0302-9743
  • DBLP ID : conf/pkdd/UenoMKI09
  • Web of Science ID : WOS:000272076400031

Export
BibTeX RIS