2020
DOI: 10.48550/arxiv.2012.13115
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Upper Confidence Bounds for Combining Stochastic Bandits

Ashok Cutkosky,
Abhimanyu Das,
Manish Purohit

Abstract: We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of N individual bandit algorithms as arms in a higher-level N -armed bandit problem that we solve with a variant of the classic UCB algorithm. Our final regret depends only on the regret of the base algorithm with the best regret in hindsight. This approach provides an easy and intuitive alternative strategy to the CORRAL algorithm for adversarial bandits, without requiring the s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 8 publications
0
1
0
Order By: Relevance
“…One of the first such approaches is the Corral algorithm of Agarwal et al [3], which uses online mirror descent with the log-barrier regularizer as the meta-algorithm. Subsequent work focuses on adapting Corral to the stochastic setting [31,28] and developing UCB-style meta-algorithms [11,4]. These approaches are quite general and can often be used with abstract non-linear function classes.…”
Section: Related Workmentioning
confidence: 99%
“…One of the first such approaches is the Corral algorithm of Agarwal et al [3], which uses online mirror descent with the log-barrier regularizer as the meta-algorithm. Subsequent work focuses on adapting Corral to the stochastic setting [31,28] and developing UCB-style meta-algorithms [11,4]. These approaches are quite general and can often be used with abstract non-linear function classes.…”
Section: Related Workmentioning
confidence: 99%