2021
DOI: 10.48550/arxiv.2102.06593
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pareto Optimal Model Selection in Linear Bandits

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…Foster et al [2019] assume the contexts are also drawn from an unknown distribution x ∼ D and propose an algorithm which does not incur more than Õ( 1 γ 3 (i * T ) 2/3 (M d i * ) 1/3 ), where γ 3 is the smallest eigenvalue of the covariance matrix of feature embeddings Σ = E x∼D 1 M a∈A φ M (x, a)φ M (x, a) ⊤ . Pacchiano et al [2020b] propose a different approach based on the corralling algorithm of Agarwal et al [2017] Zhu and Nowak [2021] show that it is impossible to achieve the desired regret guarantees of √ d m * T without additional assumptions by showing a result similar to the one of Lattimore [2015]. The work of Lattimore [2015] states that in the stochastic multi-armed bandit problem it is impossible to achieve √ T regret to a fixed arm, without suffering at least K √ T regret to a different arm.…”
Section: Related Workmentioning
confidence: 96%
“…Foster et al [2019] assume the contexts are also drawn from an unknown distribution x ∼ D and propose an algorithm which does not incur more than Õ( 1 γ 3 (i * T ) 2/3 (M d i * ) 1/3 ), where γ 3 is the smallest eigenvalue of the covariance matrix of feature embeddings Σ = E x∼D 1 M a∈A φ M (x, a)φ M (x, a) ⊤ . Pacchiano et al [2020b] propose a different approach based on the corralling algorithm of Agarwal et al [2017] Zhu and Nowak [2021] show that it is impossible to achieve the desired regret guarantees of √ d m * T without additional assumptions by showing a result similar to the one of Lattimore [2015]. The work of Lattimore [2015] states that in the stochastic multi-armed bandit problem it is impossible to achieve √ T regret to a fixed arm, without suffering at least K √ T regret to a different arm.…”
Section: Related Workmentioning
confidence: 96%
“…For example, [27] shows that, in K-armed bandit problems, one cannot simultaneously achieve O( √ T ) regret to one specific arm, while guaranteeing O( √ KT ) regret to the rest. Such results have been extended to both linear and Lipschitz non-contextual bandits [41,30,23] as well as Lipschitz contextual bandits under margin assumptions [20], and they establish that model selection is not possible in these settings. However, these ideas have not been extended to standard contextual bandit settings, which is our focus.…”
Section: Related Workmentioning
confidence: 99%
“…This approach, however, is directly ruled out by Proposition 2 since such doubling trick could end up with solving a problem with a dimension d ≤ 2d yet ρ d ρ d . Although doubling trick over dimensions is commonly used to provide worstcase guarantees in the regret minimization settings [13,21], we emphasize here that matching the instance-dependent complexity measure is a common goal in the pure exploration setting [9,6,16], and thus new techniques need to be developed. Proposition 2 also implies that trying to infer the value of ρ d from ι d can be quite misleading.…”
Section: Failure Of Standard Approachesmentioning
confidence: 99%
“…In fact, crossvalidation [18,17], a practical method for model selection, appears in almost all successful deploy-ments of machine learning models. The model selection problem was recently introduced to the bandit regret minimization setting by [7], and further analyzed by [13,21]. [21] prove that only Pareto optimality can be achieved for regret minimization, which is even weaker than minimax optimality.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation