We consider a dynamic multichannel access problem, where multiple correlated channels follow an unknown joint Markov model. A user at each time slot selects a channel to transmit data and receives a reward based on the success or failure of the transmission. The objective is to find a policy that maximizes the expected long-term reward. The problem is formulated as a partially observable Markov decision process (POMDP) with unknown system dynamics. To overcome the challenges of unknown system dynamics as well as prohibitive computation, we apply the concept of reinforcement learning and implement a Deep Q-Network (DQN) that can deal with large state space without any prior knowledge of the system dynamics. We provide an analytical study on the optimal policy for fixed-pattern channel switching with known system dynamics and show through simulations that DQN can achieve the same optimal performance without knowing the system statistics. We compare the performance of DQN with a Myopic policy and a Whittle Index-based heuristic through both simulations as well as real-data trace and show that DQN achieves near-optimal performance in more complex situations. Finally, we propose an adaptive DQN approach with the capability to adapt its learning in time-varying, dynamic scenarios. [3] has shown that dynamic spectrum access is one of the keys to improving the spectrum utilization in wireless networks and meeting the increasing need for more capacity, particularly in the presence of other networks operating in the same spectrum. In the context of cognitive radio research, a standard assumption has been that secondary users may search and use idle channels that are not being used by their primary users (PU). Although there are many existing works that focus on the algorithm design and implementation in this field, nearly all of them assume a simple independent-channel (or PU activity) model, that may not hold in practice. For instance, the operation of a low power wireless sensor network (WSN) is based on IEEE 802.15.4-radios, which uses the globally available 2.4 GHz and 868/900 MHz bands. These bands are shared by various wireless technologies (e.g. Wi-Fi, Bluetooth, RFID), as well as industrial/scientific equipment and appliances (e.g. micro-wave ovens) whose activities can affect multiple IEEE 802.15.4 channels. Thus, external interference can cause the channels in WSNs to be highly correlated, and the design of new algorithms and schemes in dynamic multichannel access is required to tackle this challenge.Motivated by such practical considerations, we consider in this work a multichannel access problem with N correlated channels. Each channel has two possible states: good or bad, and their joint distribution follow a 2 N -states Markovian model. There is a single user (wireless node) that selects one channel at each time slot to transmit a packet. If the selected channel is in the good state, the transmission is successful; otherwise, there is a transmission failure. The goal is to obtain as many successful transmissi...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.