2013年2月20日

ロックアップ期間による制約を考慮した確率的バンディット問題

研究報告数理モデル化と問題解決（MPS）

小宮山純平
佐藤一誠
中川裕志

巻: 2013
号: 10
開始ページ: 1
終了ページ: 6
記述言語: 日本語
掲載種別

バンディット問題は，複数のアーム（選択肢）から最も報酬の高いものを探す問題であり，探索と活用のトレードオフの代表的なモデルの1つである．近年において，情報推薦，最適経路探索，最適化，モデル選択などの分野への応用を動機として，バンディット問題は機械学習やオペレーション・リサーチの分野において注目を浴びている．本稿はロックアップ期間（選択するアームを変更できない期間）の制約を考慮したバンディット問題を提案し，どのような方策を取れば良いかを調べる．既存の多くの有益なアルゴリズムがロックアップ期間を含めた場合に自然に拡張可能であることを示し，その regret （性能）を評価する．この regret がロックアップ期間の最大の大きさに依存することを示す．さらに，ロックアップ期間が大きい場合に regret を減らすことができる Balancing and Recommendation （BaR）メタアルゴリズムを提案する．また，計算機実験の結果を示し，理論的な結果と比較し考察する．The multi-armed bandit problem, which is widely used to model sequential decision making, has recently attracted much attention, which has led to proposals for a wide range of apprications, such as recommendation, adaptive routing, optimization, tree search, and model selection. In actual applications, the choice of the forecaster is often restricted. This paper aims to model this restriction. The results of both theoretical analysis and empirical evaluation are presented.

リンク情報

CiNii Articles: http://ci.nii.ac.jp/naid/110009550158
CiNii Books: http://ci.nii.ac.jp/ncid/AN10505667
URL: http://id.nii.ac.jp/1001/00090412/

ID情報

CiNii Articles ID : 110009550158
CiNii Books ID : AN10505667

エクスポート: BibTeX RIS

佐藤一誠

MISC

ロックアップ期間による制約を考慮した確率的バンディット問題

メニュー

共著者の一覧