Bandits with switching costs

Dekel, Ofer; Ding, Jian; Koren, Tomer; Peres, Yuval

doi:10.1145/2591796.2591868

Cited by 62 publications

(116 citation statements)

References 14 publications

Supporting

Mentioning

113

Contrasting

Order By: Relevance

“…3) We prove that the expected weak regret of SpecWatch-II is O(T 2/3 ), which matches the lower bound in [25]. Therefore, SpecWatch-II is asymptotically optimal.…”

Section: Introductionsupporting

confidence: 67%

SpecWatch: A framework for adversarial spectrum monitoring with unknown statistics

Yang

Lin

et al. 2018

Computer Networks

View full text Add to dashboard Cite

In cognitive radio networks (CRNs), dynamic spectrum access has been proposed to improve the spectrum utilization, but it also generates spectrum misuse problems. One common solution to these problems is to deploy monitors to detect misbehaviors on certain channel. However, in multi-channel CRNs, it is very costly to deploy monitors on every channel. With a limited number of monitors, we have to decide which channels to monitor. In addition, we need to determine how long to monitor each channel and in which order to monitor, because switching channels incurs costs. Moreover, the information about the misuse behavior is not available a priori. To answer those questions, we model the spectrum monitoring problem as an adversarial multi-armed bandit problem with switching costs (MAB-SC), propose an effective framework, and design two online algorithms, SpecWatch-II and SpecWatch-III, based on the same framework. To evaluate the algorithms, we use weak regret, i.e., the performance difference between the solution of our algorithm and optimal (fixed) solution in hindsight, as the metric.We prove that the expected weak regret of SpecWatch-II is O(T 2/3 ), where T is the time horizon.Whereas, the actual weak regret of SpecWatch-III is O(T 2/3 ) with probability 1 − δ, for any δ ∈ (0, 1).Both algorithms guarantee the upper bounds matching the lower bound of the general adversarial MAB-SC problem. Therefore, they are all asymptotically optimal. Index TermsCognitive Radio Networks, Spectrum Monitoring, Multi-armed Bandit Problem. This is an extended and enhanced version of the paper [50] that appeared in INFOCOM 2016.

show abstract

“…3) We prove that the expected weak regret of SpecWatch-II is O(T 2/3 ), which matches the lower bound in [25]. Therefore, SpecWatch-II is asymptotically optimal.…”

Section: Introductionsupporting

confidence: 67%

SpecWatch: A framework for adversarial spectrum monitoring with unknown statistics

Yang

Lin

et al. 2018

Computer Networks

View full text Add to dashboard Cite

show abstract

“…. in a similar fashion as in Dekel et al [6], upon which we build our adaptive loss sequences with memory of size 1. Formally, Algorithm 5 in Appendix D.1 is used to generate the oblivious loss sequences L 1:T .…”

Section: Adversaries With Bounded Memory Of Sizementioning

confidence: 99%

“…Unfortunately, although the Exp3.G algorithm is minimax optimal for all strongly-observable graphs when the adversary is oblivious, this is not true when the game involves switching costs. It is known that certain strategies exist which incur O( √ T ) regret in the full-information game, whereas the multi-armed bandit problem suffers a lower bound of Ω(T 2/3 ) [6], even though both games are induced by strongly-observable graphs. Hence, some strongly-observable games are more difficult than others.…”

Section: Non-revealing Strongly-observable Games With Switching Costsmentioning

confidence: 99%

“…By the enumeration in Figure 3 of Appendix D, such a graph must contain a two-node subgraph G 1 that preserves the observability of G. By Lemma 10 in Appendix D.3, the game induced by G is at least as hard as any game on G 1 . The game on G 1 is simply a bandit problem, so it has Ω(T 2/3 ) regret [6]. The Ω(T 2/3 ) regret lower bound for weakly-observable games against oblivious opponents naturally extends to games against adaptive opponents.…”

Section: Non-revealing Strongly-observable Games With Switching Costsmentioning

confidence: 99%

“…A specific class of unit-memory adversaries of particular interest corresponds to oblivious adversaries with switching costs. Although the minimax regret was shown to be Θ(T 1/2 ) in the case of full-information games and Θ(T 2/3 ) in the case of bandit feedback [5,6], the gap between O(T 2/3 ) upper bounds and Ω(T 1/2 ) lower bounds for the more general class of adversaries with unit memory in the case of full-information feedback has remained unaddressed. For the problem of general feedback graphs with oblivious adversaries, Alon et al [7,8] showed that the regret is characterized by certain characteristics of the graph structure involving domination numbers and independent sets.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Feng

Loh

2018

2018 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of O(T 2/3 ) and O(T 3/4 ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω(T 2/3 ) is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of Ω(T 2/3 ), as well.

show abstract