2020
DOI: 10.48550/arxiv.2002.07530
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improved Optimistic Algorithms for Logistic Bandits

Abstract: The generalized linear bandit framework has attracted a lot of attention in recent years by extending the well-understood linear setting and allowing to model richer reward structures. It notably covers the logistic model, widely used when rewards are binary. For logistic bandits, the frequentist regret guarantees of existing algorithms are Õ(κ √ T ), where κ is a problem-dependent constant. Unfortunately, κ can be arbitrarily large as it scales exponentially with the size of the decision set. This may lead to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(21 citation statements)
references
References 5 publications
0
21
0
Order By: Relevance
“…Unfortunately, w MLE t may not satisfy Assumption 1, so we instead make use of a projected version of w MLE t . Following Faury et al (2020), and recalling Assumption 3, we define a data matrix and a transformation of w MLE t given by…”
Section: Maximum Likelihood Estimationmentioning
confidence: 99%
See 2 more Smart Citations
“…Unfortunately, w MLE t may not satisfy Assumption 1, so we instead make use of a projected version of w MLE t . Following Faury et al (2020), and recalling Assumption 3, we define a data matrix and a transformation of w MLE t given by…”
Section: Maximum Likelihood Estimationmentioning
confidence: 99%
“…At each iteration t, we then compute an estimate w L t and determine a set of candidate policies S t for which no other policy π significantly outperforms a member of S t . The threshold for what * A slight modification in the expression of βt(δ) is needed to incorporate the fact that we assume φ(τ ) ≤ B for any τ (Assumption 2) while B = 1 in Faury et al (2020). But this can be easily incorporated using Thm.…”
Section: Algorithm and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, logistic bandits, that are appropriate for modeling binary reward structures, are a special case of generalized linear bandits (GLBs) with µ(x) = 1 1+exp(−x) . UCB-based algorithms for GLBs were first introduced in [Filippi et al, 2010, Li et al, 2017, Faury et al, 2020. The same problem, but with a Thompson Sampling-(TS) strategy was also studied in [Abeille et al, 2017, Russo and Van Roy, 2013, Russo and Van Roy, 2014, Dong and Van Roy, 2018.…”
Section: Introductionmentioning
confidence: 99%
“…For this model, we present an algorithm and a corresponding regret bound. Our algorithmic and analytic contribution is in large inspired by very recent exciting progress on binary logistic bandits by [Faury et al, 2020].…”
Section: Introductionmentioning
confidence: 99%