2021
DOI: 10.48550/arxiv.2110.08627
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…While with f (j) = 2 log(j) the policy correctly determined all b fastest workers in ten simulation runs, with f (j) = 2 log(j)μ min in one out of ten simulations the algorithm commits to a worker with a suboptimality gap of 0.1. This reflects the trade-off between the competing objectives of best arm identification and regret minimization discussed in [42]. However, since the fastest workers have been determined eventually with an accuracy of 99.5%, the proposed adapted confidence bound seems reasonable improve the convergence rate.…”
Section: Simulation Resultsmentioning
confidence: 87%
“…While with f (j) = 2 log(j) the policy correctly determined all b fastest workers in ten simulation runs, with f (j) = 2 log(j)μ min in one out of ten simulations the algorithm commits to a worker with a suboptimality gap of 0.1. This reflects the trade-off between the competing objectives of best arm identification and regret minimization discussed in [42]. However, since the fastest workers have been determined eventually with an accuracy of 99.5%, the proposed adapted confidence bound seems reasonable improve the convergence rate.…”
Section: Simulation Resultsmentioning
confidence: 87%