2021
DOI: 10.48550/arxiv.2107.09912
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Design of Experiments for Stochastic Contextual Linear Bandits

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Note that if the context set is deterministic, this objective corresponds to (the square root of) the G-optimality criterion in classical experimental designs (see, e.g., [Pukelsheim, 2006, Atkinson et al, 2007). For stochastic context sets, the objective was recently found closely related to linear contextual bandits and studied by [Ruan et al, 2020, Zanette et al, 2021. Our ExplorationPolicy procedure is employed during each batch of the main algorithm to decide the policy used in the next batch.…”
Section: Single-batch Learning For the Exploration Policymentioning
confidence: 99%
See 4 more Smart Citations
“…Note that if the context set is deterministic, this objective corresponds to (the square root of) the G-optimality criterion in classical experimental designs (see, e.g., [Pukelsheim, 2006, Atkinson et al, 2007). For stochastic context sets, the objective was recently found closely related to linear contextual bandits and studied by [Ruan et al, 2020, Zanette et al, 2021. Our ExplorationPolicy procedure is employed during each batch of the main algorithm to decide the policy used in the next batch.…”
Section: Single-batch Learning For the Exploration Policymentioning
confidence: 99%
“…In contrast, thanks to a few key technical modifications and a new matrix concentration inequality (to be introduced in Section 1.1.4), our algorithm can directly learn the desired exploration policy with better performance, and is simpler to describe and implement. We also note that in the concurrent work [Zanette et al, 2021], the authors studied a similar task to our ExplorationPolicy. In Section 4.1, we compare our performance guarantee and the results in [Zanette et al, 2021], and demonstrate the superiority of our procedure.…”
Section: Single-batch Learning For the Exploration Policymentioning
confidence: 99%
See 3 more Smart Citations