2021
DOI: 10.48550/arxiv.2110.10245
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Regret Minimization in Isotonic, Heavy-Tailed Contextual Bandits via Adaptive Confidence Bands

Abstract: In this paper we initiate a study of non parametric contextual bandits under shape constraints on the mean reward function. Specifically, we study a setting where the context is one dimensional, and the mean reward function is isotonic with respect to this context. We propose a policy for this problem and show that it attains minimax rate optimal regret. Moreover, we show that the same policy enjoys automatic adaptation; that is, for subclasses of the parameter space where the true mean reward functions are al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…Compared to the results in the literature for heavy-tailed bandits without a zero-inflated structure [12,18,14], our regret bound only introduces an additional term,…”
Section: Regret Bounds For Ucb-type Algorithmsmentioning
confidence: 73%
“…Compared to the results in the literature for heavy-tailed bandits without a zero-inflated structure [12,18,14], our regret bound only introduces an additional term,…”
Section: Regret Bounds For Ucb-type Algorithmsmentioning
confidence: 73%
“…A natural question is whether these signal parameters can be estimated which would give a way to obtain finite sample confidence bands for the underlying function. Such elementwise confidence bands are important in application domains such as contextual bandits, see Chatterjee and Sen (2021).…”
Section: Discussionmentioning
confidence: 99%