2013
DOI: 10.2139/ssrn.2326583
|View full text |Cite
|
Sign up to set email alerts
|

Robust Control of the Multi-Armed Bandit Problem

Abstract: We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
19
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(19 citation statements)
references
References 22 publications
0
19
0
Order By: Relevance
“…The study of an adversary for the payoff in the bandit problem (via Gittins index) has been considered by Caro and Gupta [20] and Kim and Lim [51] (with additional penalty in the reward) using Whittle's retirement option argument [76]. In their works, they rely heavily on a Markov assumption, which allows them to postulate a robust dynamic programming principle (see also Iyengar [46], Nilim and El Ghaoui [58]).…”
Section: Multi-armed Banditsmentioning
confidence: 99%
See 1 more Smart Citation
“…The study of an adversary for the payoff in the bandit problem (via Gittins index) has been considered by Caro and Gupta [20] and Kim and Lim [51] (with additional penalty in the reward) using Whittle's retirement option argument [76]. In their works, they rely heavily on a Markov assumption, which allows them to postulate a robust dynamic programming principle (see also Iyengar [46], Nilim and El Ghaoui [58]).…”
Section: Multi-armed Banditsmentioning
confidence: 99%
“…For example, statistical concerns are treated in this framework in [23,24] or Bielecki, Cialenco and Chen [14]. We could also allow adversarial choices with a range of a fixed set (as in the classical adversarial bandit problem [7] or as in [20,51]) or a random set which can be used to model learning as in the classical Gittins' theory. We also allow dynamic adversaries, which are not considered in the usual adversarial setting.…”
Section: Multi-armed Banditsmentioning
confidence: 99%
“…We know of one other paper on robust bandit problems, which is the recent paper by Caro and Gupta (2015). We briefly highlight the differences.…”
Section: Relevant Literaturementioning
confidence: 99%
“…We briefly highlight the differences. First, our paper adopts a different approach for expressing model uncertainty; Caro and Gupta (2015) adopts the constraint approach whereas we adopt the penalty approach. The differences between these approaches are discussed in §3.1.…”
Section: Relevant Literaturementioning
confidence: 99%
See 1 more Smart Citation