In cognitive radio networks (CRNs), dynamic spectrum access has been proposed to improve the spectrum utilization, but it also generates spectrum misuse problems. One common solution to these problems is to deploy monitors to detect misbehaviors on certain channel. However, in multi-channel CRNs, it is very costly to deploy monitors on every channel. With a limited number of monitors, we have to decide which channels to monitor. In addition, we need to determine how long to monitor each channel and in which order to monitor, because switching channels incurs costs. Moreover, the information about the misuse behavior is not available a priori. To answer those questions, we model the spectrum monitoring problem as an adversarial multi-armed bandit problem with switching costs (MAB-SC), propose an effective framework, and design two online algorithms, SpecWatch-II and SpecWatch-III, based on the same framework. To evaluate the algorithms, we use weak regret, i.e., the performance difference between the solution of our algorithm and optimal (fixed) solution in hindsight, as the metric.We prove that the expected weak regret of SpecWatch-II is O(T 2/3 ), where T is the time horizon.Whereas, the actual weak regret of SpecWatch-III is O(T 2/3 ) with probability 1 − δ, for any δ ∈ (0, 1).Both algorithms guarantee the upper bounds matching the lower bound of the general adversarial MAB-SC problem. Therefore, they are all asymptotically optimal.
Index TermsCognitive Radio Networks, Spectrum Monitoring, Multi-armed Bandit Problem. This is an extended and enhanced version of the paper [50] that appeared in INFOCOM 2016.