2019
DOI: 10.48550/arxiv.1911.00980
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Zeroth Order Non-convex optimization with Dueling-Choice Bandits

Yichong Xu,
Aparna Joshi,
Aarti Singh
et al.

Abstract: We consider a novel setting of zeroth order non-convex optimization, where in addition to querying the function value at a given point, we can also duel two points and get the point with the larger function value. We refer to this setting as optimization with dueling-choice bandits since both direct queries and duels are available for optimization. We give the COMP-GP-UCB algorithm based on GP-UCB (Srinivas et al., 2009), where instead of directly querying the point with the maximum Upper Confidence Bound (UCB… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…PbRL is relevant to several settings in Multi-Armed Bandits. Dueling bandits [10,32] is essentially the one-state version of PbRL, and has been extensively studied in the literature [15,16,31,33]. However, PbRL is significantly harder because in PbRL the observation (preference) is based on the sum of rewards on a trajectory rather than individual reward values.…”
Section: Introductionmentioning
confidence: 99%
“…PbRL is relevant to several settings in Multi-Armed Bandits. Dueling bandits [10,32] is essentially the one-state version of PbRL, and has been extensively studied in the literature [15,16,31,33]. However, PbRL is significantly harder because in PbRL the observation (preference) is based on the sum of rewards on a trajectory rather than individual reward values.…”
Section: Introductionmentioning
confidence: 99%