Non-Cooperative Energy Efficient Power Allocation Game in D2D Communication: A Multi-Agent Deep Reinforcement Learning Approach

Nguyen, Khoi Khac; Duong, Trung Q.; Vien, Ngo Anh; Le-Khac, Nhien-An; Nguyen, Minh-Nghia

doi:10.1109/access.2019.2930115

Cited by 74 publications

(52 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…And, the corresponding distributed iterative algorithm were also proposed and evaluated. In [21], a simulation-based optimization framework for D2D network was proposed to achieve a tradeoff between the energy consumption and network performance. Although the existence of Nash equilibrium point was not proved theoretically, the simulation results showed that the proposed optimization framework has an excellent performance in most case.…”

Section: Related Workmentioning

confidence: 99%

Energy Efficient Transmission Power Control Policy of the Delay Tolerable Communication Service

et al. 2020

View full text Add to dashboard Cite

In recent years, the development of wireless communication leads to an explosive growth of energy demand, widely application of smart devices and rapid emergence of services. So, the energy efficient communication is expected urgently to save power and prolong the lifetime of the resourceconstrained terminal devices. Especially in 5G age, the excellent spectrum efficiency provides more opportunity to save power by adjusting the transmission power of the delay tolerable (DT) service. Meanwhile, although the tradeoff between energy efficiency and service delay plays a non-negligible role in the energy efficient communication, it is not exploited sufficiently due to the time variation and randomness of wireless communication channel. For this reason, the fundamental tradeoff between energy efficiency and delay of the DT service is investigated and analyzed. And, the optimal problem of energy efficient communication for DT service is formulated as a Markov Decision Process (MDP) which can be solved effectively by statistical dynamic programming (SDP) since the perfect channel state information (CSI) is hard to obtain. To improve the utility of research, the approximate SDP (ASDP) and Q-learning are also investigated to overcome the limitation of the curse of dimensionality and model-based algorithm respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

Energy Efficient Transmission Power Control Policy of the Delay Tolerable Communication Service

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Very recently, deterministic policy gradient (DPG) is deployed as an actor-critic algorithm in which the policy gradient theorem is extended from stochastic policy to deterministic policy. Inspired by the success of deep Q-learning [26], which uses neural network function approximation to learn value functions for a very large state and action space online, the combination of DPG and deep learning called deep deterministic policy gradient enables learning in continuous spaces.…”

Section: Distributed Deep Deterministic Policy Gradientmentioning

confidence: 99%

“…Deep reinforcement learning, a combination of RL and deep neural network, has been used widely in wireless communication thanks to its powerful features, impressive performance, and adequate processing time. The authors in [26] formulated a non-cooperative power allocation game in D2D communications and proposed three approaches based on deep Q-learning, double deep Q-learning, and dueling deep Q-learning algorithm for multi-agent learning to find the optimal power level for each D2D pair in order to maximise the network performance. The authors in [27] used deep Q-learning algorithm to look for the optimal sub-band and transmission power level for each V2V user in V2V communications while satisfying the requirement of low latency.…”

Section: Introductionmentioning

confidence: 99%

Distributed Deep Deterministic Policy Gradient for Power Allocation Control in D2D-Based V2V Communications

et al. 2019

Self Cite

View full text Add to dashboard Cite

Device-to-device (D2D) communication is an emerging technology in the evolution of the 5G network enabled vehicle-to-vehicle (V2V) communications. It is a core technique for the next generation of many platforms and applications, e.g. real-time high-quality video streaming, virtual reality game, and smart city operation. However, the rapid proliferation of user devices and sensors leads to the need for more efficient resource allocation algorithms to enhance network performance while still capable of guaranteeing the quality-of-service. Currently, deep reinforcement learning is rising as a powerful tool to enable each node in the network to have a real-time self-organising ability. In this paper, we present two novel approaches based on deep deterministic policy gradient algorithm, namely ''distributed deep deterministic policy gradient'' and ''sharing deep deterministic policy gradient'', for the multi-agent power allocation problem in D2D-based V2V communications. Numerical results show that our proposed models outperform other deep reinforcement learning approaches in terms of the network's energy efficiency and flexibility. INDEX TERMS Non-cooperative D2D communication, D2D-based V2V communications, power allocation, multi-agent deep reinforcement learning, and deep deterministic policy gradient (DDPG).

show abstract

“…The characteristics of each fault current reduction method introduced above are summarized in Table 1. This study applies reinforcement learning (RL) [22][23][24][25][26][27][28][29][30][31][32][33][34] to conduct bus and line separation more systematically; these are the most widely used techniques for grid operation as they can be performed immediately and without additional cost. Because there are many buses and lines in a grid, there are numerous ways to reduce short circuit current.…”

Section: Introductionmentioning

confidence: 99%

Control Method of Buses and Lines Using Reinforcement Learning for Short Circuit Current Reduction

Han¹

2020

Sustainability

View full text Add to dashboard Cite

This paper proposes a reinforcement learning-based approach that optimises bus and line control methods to solve the problem of short circuit currents in power systems. Expansion of power grids leads to concentrated power output and more lines for large-scale transmission, thereby increasing short circuit currents. The short circuit currents must be managed systematically by controlling the buses and lines such as separating, merging, and moving a bus, line, or transformer. However, there are countless possible control schemes in an actual grid. Moreover, to ensure compliance with power system reliability standards, no bus should exceed breaker capacity nor should lines or transformers be overloaded. For this reason, examining and selecting a plan requires extensive time and effort. To solve these problems, this paper introduces reinforcement learning to optimise control methods. By providing appropriate rewards for each control action, a policy is set, and the optimal control method is obtained through a maximising value method. In addition, a technique is presented that systematically defines the bus and line separation measures, limits the range of measures to those with actual power grid applicability, and reduces the optimisation time while increasing the convergence probability and enabling use in actual power grid operation. In the future, this technique will contribute significantly to establishing power grid operation plans based on short circuit currents.

show abstract

Non-Cooperative Energy Efficient Power Allocation Game in D2D Communication: A Multi-Agent Deep Reinforcement Learning Approach

Cited by 74 publications

References 33 publications

Energy Efficient Transmission Power Control Policy of the Delay Tolerable Communication Service

Energy Efficient Transmission Power Control Policy of the Delay Tolerable Communication Service

Distributed Deep Deterministic Policy Gradient for Power Allocation Control in D2D-Based V2V Communications

Control Method of Buses and Lines Using Reinforcement Learning for Short Circuit Current Reduction

Contact Info

Product

Resources

About