2009
Optimal Online Learning Procedures for Model-Free Policy Evaluation
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II
- ,
- ,
- ,
- Volume
- 5782
- Number
- First page
- 473
- Last page
- +
- Language
- English
- Publishing type
- Research paper (international conference proceedings)
- DOI
- 10.1007/978-3-642-04174-7_31
- Publisher
- SPRINGER-VERLAG BERLIN
In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem.
- Link information
-
- DOI
- https://doi.org/10.1007/978-3-642-04174-7_31
- DBLP
- https://dblp.uni-trier.de/rec/conf/pkdd/UenoMKI09
- Web of Science
- https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000272076400031&DestApp=WOS_CPL
- URL
- http://dblp.uni-trier.de/db/conf/pkdd/pkdd2009-2.html#conf/pkdd/UenoMKI09
- ID information
-
- DOI : 10.1007/978-3-642-04174-7_31
- ISSN : 0302-9743
- DBLP ID : conf/pkdd/UenoMKI09
- Web of Science ID : WOS:000272076400031