Non-Asymptotic Analysis of a New Bandit Algorithm for Semi-Bounded Reward

JOURNAL OF MACHINE LEARNING RESEARCH

Honda, Junya
Takemura, Akimichi

巻: 16
号
開始ページ: 3721
終了ページ: 3756
記述言語: 英語
掲載種別: 研究論文（学術雑誌）
出版者・発行元: MICROTOME PUBL

In this paper we consider a stochastic multiarmed bandit problem. It is known in this problem that Deterministic Minimum Empirical Divergence (DMED) policy achieves the asymptotic theoretical bound for the model where each reward distribution is supported in a known bounded interval, say [0; 1]. However, the regret bound of DMED is described in an asymptotic form and the performance in finite time has been unknown. We modify this policy and derive a finite-time regret bound for the new policy, Indexed Minimum Empirical Divergence (IMED), by refining large deviation probabilities to a simple nonasymptotic form. Further, the refined analysis reveals that the finite-time regret bound is valid even in the case that the reward is not bounded from below. Therefore, our finitetime result applies to the case that the minimum reward (that is, the maximum loss) is unknown or unbounded. We also present some simulation results which shows that IMED much improves DMED and performs competitively to other state-of-the-art policies.

リンク情報

Web of Science: https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=JSTA_CEL&SrcApp=J_Gate_JST&DestLinkType=FullRecord&KeyUT=WOS:000369888000042&DestApp=WOS_CPL

ID情報

ISSN : 1532-4435
Web of Science ID : WOS:000369888000042

エクスポート: BibTeX RIS

本多淳也

論文

Non-Asymptotic Analysis of a New Bandit Algorithm for Semi-Bounded Reward

メニュー

共著者の一覧