Cloud radio access network (CRAN) has been shown as an effective means to boost network performance. Such gain stems from the intelligent management of remote radio heads (RRHs) in terms of on/off operation mode and power consumption. Most conventional resource allocation (RA) methods, however, optimize the network utility without considering the switching overhead of RRHs in adjacent time intervals. When the network environment becomes time-correlated, mathematical optimization is not directly applicable. In this paper, we aim to optimize the energy efficiency (EE) subject to the constraints on per-RRH transmission power and user data rates. To this end, we formulate the EE problem as a Markov decision process (MDP) and subsequently adopt deep reinforcement learning (DRL) technique to reap the cumulative EE rewards. Our starting point is the deep Q network (DQN), which is a combination of deep learning and Q-learning. In each time slot, DQN configures the status of RRHs yielding the largest Q-value (known as state-action value) prior to solving a power minimization problem for active RRHs. To overcome the Q-value overestimation issue of DQN, we propose a Double DQN (DDQN) framework that obtains optimal reward better than DQN by separating the selected action from the target Q-value generator. Simulation results validate that the DDQN-based RA method is more energy-efficient than the DQN-based RA algorithm and a baseline solution.
The rapid increase of user data traffic demand has promoted the telecommunication sector toward adopting a new generation, that is, fifth-generation (5G). Cloud radio access network (CRAN) has gained considerable attention to satisfy the high traffic demand in the 5G network via deployment and intelligent management of multiple remote radio units (RRHs). However, optimizing the instantaneous network performance may lead to myopic decision-making, such as excessive on/off switching of RRHs. This paper proposes a deep reinforcement learning (DRL) based framework, with the goal of maximizing the long-term tradeoff between energy efficiency (EE) and spectral efficiency (SE). To this end, we specifically formulate the joint optimization problem as a Markov decision process (MDP) subject to the constraints on per-RRH transmission power and user quality of service (QoS) demands. Meanwhile, considering the spatio-temporal channel state information (CSI), we adopt machine learning (ML) techniques to extract generalized features before feeding them into the input of DRL. Combining with RRH status and QoS requirements, the proposed algorithm can learn the near-optimal control strategy to turn on/off RRHs, followed by solving a power optimization problem. Simulation results reveal that the proposed method yields better performance as compared to myopic and DRL methods without considering CSI generalization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.