Optimal Online Learning Procedures for Model-Free Policy Evaluation

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II

Tsuyoshi Ueno
Shin-ichi Maeda
Motoaki Kawanabe
Shin Ishii

Volume: 5782
Number
First page: 473
Last page: +
Language: English
Publishing type: Research paper (international conference proceedings)
DOI: 10.1007/978-3-642-04174-7_31
Publisher: SPRINGER-VERLAG BERLIN

In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem.

Link information

DOI: https://doi.org/10.1007/978-3-642-04174-7_31
DBLP: https://dblp.uni-trier.de/rec/conf/pkdd/UenoMKI09
Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000272076400031&DestApp=WOS_CPL
URL: http://dblp.uni-trier.de/db/conf/pkdd/pkdd2009-2.html#conf/pkdd/UenoMKI09

ID information

DOI : 10.1007/978-3-642-04174-7_31
ISSN : 0302-9743
DBLP ID : conf/pkdd/UenoMKI09
Web of Science ID : WOS:000272076400031

Export: BibTeX RIS

Motoaki Kawanabe

Papers

Optimal Online Learning Procedures for Model-Free Policy Evaluation

Menu

Coauthors