2015
DOI: 10.1007/s10479-015-1965-7
|View full text |Cite
|
Sign up to set email alerts
|

Robust control of the multi-armed bandit problem

Abstract: We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 29 publications
0
2
0
Order By: Relevance
“…As pointed out by Chan andFarias (2009), Brown et al (2010), and Rogers (2007), even without the added difficulty of accounting for model miss-specification, solving dynamic optimization problems to optimality is difficult and one must often resort to heuristics. As such, other papers in this research area have diverted their efforts to developing methods for bounding the performance of heuristic policies relative to the robust optimal (e.g., Caro andGupta 2015, Kim andLim 2015). However, very few results regarding the structure of the optimal robust policy have been published in the robust dynamic optimization literature, which is the focus of this paper.…”
Section: Introductionmentioning
confidence: 99%
“…As pointed out by Chan andFarias (2009), Brown et al (2010), and Rogers (2007), even without the added difficulty of accounting for model miss-specification, solving dynamic optimization problems to optimality is difficult and one must often resort to heuristics. As such, other papers in this research area have diverted their efforts to developing methods for bounding the performance of heuristic policies relative to the robust optimal (e.g., Caro andGupta 2015, Kim andLim 2015). However, very few results regarding the structure of the optimal robust policy have been published in the robust dynamic optimization literature, which is the focus of this paper.…”
Section: Introductionmentioning
confidence: 99%
“…Thus (B µ t ) takes the same role as (W η t ) in CE (see (1) above). Rewrite(7) asdB µ t = − 1 σθ µ t (Z t ) dt +…”
mentioning
confidence: 99%