The model-based power allocation algorithm has been investigated for decades, but it requires the mathematical models to be analytically tractable and it usually has high computational complexity.Recently, the data-driven model-free machine learning enabled approaches are being rapidly developed to obtain near-optimal performance with affordable computational complexity, and deep reinforcement learning (DRL) is regarded as of great potential for future intelligent networks. In this paper, the DRL approaches are considered for power control in multi-user wireless communication cellular networks.Considering the cross-cell cooperation, the off-line/on-line centralized training and the distributed execution, we present a mathematical analysis for the DRL-based top-level design. The concrete DRL design is further developed based on this foundation, and policy-based REINFORCE, value-based deep Q learning (DQL), actor-critic deep deterministic policy gradient (DDPG) algorithms are proposed.Simulation results show that the proposed data-driven approaches outperform the state-of-art modelbased methods on sum-rate performance, with good generalization power and faster processing speed.Furthermore, the proposed DDPG outperforms the REINFORCE and DQL in terms of both sum-rate performance and robustness, and can be incorporated into existing resource allocation schemes due to its generality.Deep reinforcement learning, deep deterministic policy gradient, policy-based, interfering multipleaccess channel, power control, resource allocation.
I. INTRODUCTIONWireless data transmission has experienced tremendous growth in past years and will continue to grow in the future. When large numbers of terminals such as mobile phones and wearable devices are connected to the networks, the density of access point (AP) will have to be increased. Dense deployment of small cells such as pico-cells, femto-cells, has become the most effective solution to accommodate the critical demand for spectrum [1]. With denser APs and smaller cells, the whole communication network is flooded with wireless signals, and thus the intra-cell and inter-cell interference problems are severe [2]. Therefore, power allocation and interference management are crucial and challenging [3], [4].Massive model-oriented algorithms have been developed to cope with interference management [5]- [9], and the existing studies mainly focus on sub-optimal or heuristic algorithms, whose performance gaps to the optimal solution are typically difficult to quantify. Besides, the mathematical models are usually assumed to be analytically tractable, but these models are not always accurate because both hardware and channel imperfections can exist in practical communication environments. When considering specific hardware components and realistic transmission scenarios, such as low-resolution A/D, nonlinear amplifier and user distribution, the signal processing techniques with model-driven tools are challenging to be developed. Moreover, the computational complexity of these algorithms i...