2019
DOI: 10.48550/arxiv.1910.09714
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Smoothness-Adaptive Contextual Bandits

Abstract: We study a non-parametric multi-armed bandit problem with stochastic covariates, where a key driver of complexity is the smoothness with which the payoff functions vary with covariates. Previous studies have derived minimax-optimal algorithms in cases where it is a priori known how smooth the payoff functions are. In practice, however, advance information about the smoothness of payoff functions is typically not available, and misspecification of smoothness may severely deteriorate the performance of existing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…Is is possible to design policies that achieve the optimal regret, without explicit knowledge of the underlying smoothness?. This question was explored in [28], who discovered that this is not possible in general. They established that the absence of adaptive policies arises from the non-existence of adaptive confidence sets in non-parametric regression under smoothness classes.…”
Section: Background and Comparisons With Existing Literaturementioning
confidence: 99%
“…Is is possible to design policies that achieve the optimal regret, without explicit knowledge of the underlying smoothness?. This question was explored in [28], who discovered that this is not possible in general. They established that the absence of adaptive policies arises from the non-existence of adaptive confidence sets in non-parametric regression under smoothness classes.…”
Section: Background and Comparisons With Existing Literaturementioning
confidence: 99%
“…A third set of bandit algorithms that does not fall neatly into any of the two categories above are algorithms that allow for a non-parametric model class. For example, in Rigollet and Zeevi (2010), Perchet et al (2013) the reward model is assumed to be Hölder continuous but non-differentiable, and in Hu et al (2020), Gur et al (2019) it satisfies a Hölder smoothness assumption. The main characteristic of this class of algorithms is that they partition the covariate space into hypercubes of appropriate size and run multi-armed bandit algorithms within each cube.…”
Section: Introductionmentioning
confidence: 99%