2016
DOI: 10.1287/mnsc.2015.2153
|View full text |Cite
|
Sign up to set email alerts
|

Robust Multiarmed Bandit Problems

Abstract: T he multiarmed bandit problem is a popular framework for studying the exploration versus exploitation trade-off. Recent applications include dynamic assortment design, Internet advertising, dynamic pricing, and the control of queues. The standard mathematical formulation for a bandit problem makes the strong assumption that the decision maker has a full characterization of the joint distribution of the rewards, and that "arms" under this distribution are independent. These assumptions are not satisfied in man… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
31
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 48 publications
(32 citation statements)
references
References 58 publications
1
31
0
Order By: Relevance
“…In this regard, one can view P and T as a tuning parameters that parametrizes a family of policies of increasing robustness with decreasing P and T . In practice, to select specific values of the ambiguity parameters using historical data, the parameters can be tuned using outof-sample tests (see, for example, the recent papers of Lim 2015 andGotoh et al 2016 who consider this issue in detail and propose a number of robust cross-validation procedures to select ambiguity parameters from data).…”
Section: The Robust Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…In this regard, one can view P and T as a tuning parameters that parametrizes a family of policies of increasing robustness with decreasing P and T . In practice, to select specific values of the ambiguity parameters using historical data, the parameters can be tuned using outof-sample tests (see, for example, the recent papers of Lim 2015 andGotoh et al 2016 who consider this issue in detail and propose a number of robust cross-validation procedures to select ambiguity parameters from data).…”
Section: The Robust Modelmentioning
confidence: 99%
“…In the constraint approach, the set of alternative models is represented as a hard constraint, and confidence in the nominal is captured by the size of this uncertainty set (see, e.g., Ben-Tal and Nemirovski 1998, 1999, 2000Bertsimas and Sim 2004;El Ghaoui and Lebret 1997;Iyengar 2005;Li and Kwon 2013;Nilim and El Ghaoui 2005;Wiesemann et al 2013). The penalty approach on the other hand expresses confidence in the nominal by penalizing alternative models that deviate too far from the nominal, and does so via a penalty function (soft constraint) that appears in the objective function (see, e.g., Dai Pra et al 1996, Peterson et al 2000, Hansen and Sargent 2007, Jain et al 2010, Kim and Lim 2015, Lim and Shanthikumar 2007. In this paper, we adopt an entropy penalty approach to represent model missspecification, because it provides a number of advantages from the perspective of characterizing the structure of the optimal robust control policy.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…With this approach, a bandit process is naturally formulated as a Markov decision process, the solution of which is often obtained through stochastic dynamic programming. In addition to the aforementioned applications in business and clinical trials, the bandit model is frequently used as a theoretical framework for many other research problems, including stochastic scheduling, queueing networks, optimal investment and consumption, dynamic assortment design, modern online service, and webpage design …”
Section: Introductionmentioning
confidence: 99%
“…This approach has been extended to finite state and action MDPs with incomplete information (Burnetas and Katehakis [6]) and to adversarial bandits that either make no assumption whatsoever on the process generating the payoffs of the bandits (Auer et al [1]) or bound its variation within a "variation budget" (Besbes et al [4]). At the time of submission we became aware of the work by Kim and Lim [18] that also study the RMAB problem but with an alternative formulation in which deviations of the transition probabilities from their point estimates are penalized, so the analysis is essentially different from ours.…”
Section: Introductionmentioning
confidence: 99%