Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing 2014
DOI: 10.1145/2591796.2591868
|View full text |Cite
|
Sign up to set email alerts
|

Bandits with switching costs

Abstract: We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T -round minimax regret in this setting is Θ(T 2/3 ), thereby closing a fundamental gap in our understanding of learning with bandit feedback. In the corresponding full-information version of the problem, the minimax regret is known to grow at a much slower rate of Θ( √ T ). The difference between these two rates provides the first indication that learni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
113
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 62 publications
(116 citation statements)
references
References 14 publications
3
113
0
Order By: Relevance
“…3) We prove that the expected weak regret of SpecWatch-II is O(T 2/3 ), which matches the lower bound in [25]. Therefore, SpecWatch-II is asymptotically optimal.…”
Section: Introductionsupporting
confidence: 67%
“…3) We prove that the expected weak regret of SpecWatch-II is O(T 2/3 ), which matches the lower bound in [25]. Therefore, SpecWatch-II is asymptotically optimal.…”
Section: Introductionsupporting
confidence: 67%
“…. in a similar fashion as in Dekel et al [6], upon which we build our adaptive loss sequences with memory of size 1. Formally, Algorithm 5 in Appendix D.1 is used to generate the oblivious loss sequences L 1:T .…”
Section: Adversaries With Bounded Memory Of Sizementioning
confidence: 99%
“…Unfortunately, although the Exp3.G algorithm is minimax optimal for all strongly-observable graphs when the adversary is oblivious, this is not true when the game involves switching costs. It is known that certain strategies exist which incur O( √ T ) regret in the full-information game, whereas the multi-armed bandit problem suffers a lower bound of Ω(T 2/3 ) [6], even though both games are induced by strongly-observable graphs. Hence, some strongly-observable games are more difficult than others.…”
Section: Non-revealing Strongly-observable Games With Switching Costsmentioning
confidence: 99%
See 2 more Smart Citations