2003
DOI: 10.1007/978-3-540-45167-9_31
|View full text |Cite
|
Sign up to set email alerts
|

Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
261
1
4

Year Published

2005
2005
2014
2014

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 162 publications
(268 citation statements)
references
References 8 publications
2
261
1
4
Order By: Relevance
“…Upper confidence bounds are also central to the design of multi-armed bandit problems in the PAC setting [EDMM06,MT04], where the algorithm's objective is to identify an arm that is ε-optimal with probability at least 1 − δ. Our work adopts a very different feedback model (pairwise comparisons rather than direct observation of payoffs) and a different objective (regret minimization rather than the PAC objective) but there are clear similarities between our IF1 and IF2 algorithms and the Successive Elimination and Median Eliminiation algorithms developed for the PAC setting in [EDMM06].…”
Section: Related Workmentioning
confidence: 99%
“…Upper confidence bounds are also central to the design of multi-armed bandit problems in the PAC setting [EDMM06,MT04], where the algorithm's objective is to identify an arm that is ε-optimal with probability at least 1 − δ. Our work adopts a very different feedback model (pairwise comparisons rather than direct observation of payoffs) and a different objective (regret minimization rather than the PAC objective) but there are clear similarities between our IF1 and IF2 algorithms and the Successive Elimination and Median Eliminiation algorithms developed for the PAC setting in [EDMM06].…”
Section: Related Workmentioning
confidence: 99%
“…The regret of established bandit algorithms such as UCB1 (Auer et al, 2002) is logarithmic in the number of steps, but grows linearly with the number of arms. This is also best possible (Mannor and Tsitsiklis, 2004).…”
Section: Colored Banditsmentioning
confidence: 94%
“…This suggests a pac-mdp algorithm can be used to learn the bandit with p(a) := p ⊕ 1,a . We then make use of a theorem of Mannor and Tsitsiklis on bandit sample-complexity [MT04] to show that with high probability the number of times a * is not selected is at least…”
Section: Fig 1 Hard Mdpmentioning
confidence: 99%