Automated Enemy Avoidance of Unmanned Aerial Vehicles Based on Reinforcement Learning

Cheng, Qiao; Wang, Xiangke; Yang, Jian

doi:10.3390/app9040669

Cited by 15 publications

(11 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Classical reinforcement learning is difficult or impossible to traverse all cases in the face of high dimension of state and action space, which may result in slow convergence of the algorithm or the inability to learn reasonable strategies. An effective way to solve the above problems is to use the method of function approximation to express the value function or strategy explicitly [21]. For the complex nonlinear function, the deep neural network has a better approximate effect, so it has become a trend to introduce the deep neural network as a tool into RL for approximating the value function or strategy function in recent years [20].…”

Section: B Ddpg Alogorithmmentioning

confidence: 99%

“…According to the requirement of discrete control, deep Q network (DQN), a traditional DRL method for solving discrete space was applied. Second, the efficiency of RL should still be further improved although numerous researchers have focused on the issue [21], [22]. Typically, DDPG, an algorithm for solving continuous controlling models was proposed by DeepMind in 2016 [17].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Reinforcement Learning With Application to Air Confrontation Intelligent Decision-Making of Manned/Unmanned Aerial Vehicle Cooperative System

Wang

2020

IEEE Access

View full text Add to dashboard Cite

With the development of intelligence in air confrontation, the demand for cooperative engagement of manned/unmanned aerial vehicle (MAV/UAV) is becoming more intense. Deep reinforcement learning (DRL), which combines the abstract representation capability of deep learning (DL) and the optimal decision-making and control capability of reinforcement learning (RL), is an appropriate application for dealing with this problem. In the case of continuous action space, the dynamics model of UAV and the basic structure of one of the most popular DRL methods called deep deterministic policy gradient (DDPG) are built firstly. To establish the framework of intelligent decision-making of MAV/UAV, typical intentions including Head-on attack, Fleeing, Pursuing and Energy-storing, corresponding to four optimization models, are introduced secondly. Then the neural network is trained by means of reconstructing the replay buffer of DDPG algorithm. Finally, simulation results show that UAV is able to learn intelligent decision-making throughout the intention guiding of MAV. Compared with original DDPG algorithm, the improved method can achieve a better performance in convergence and stability. Furthermore, the level of intelligent decision-making in air confrontation can be improved by self-learning.INDEX TERMS Manned/unmanned aerial vehicle, intelligent decision-making, application of deep reinforcement learning, intention guiding, deep deterministic policy gradient, self-learning.

show abstract

Section: B Ddpg Alogorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Deep Reinforcement Learning With Application to Air Confrontation Intelligent Decision-Making of Manned/Unmanned Aerial Vehicle Cooperative System

Wang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Reinforced learning has been used in traffic light configuration to help us find the best state by rewarding the actions chosen by the agent. Common methods of Reinforcement Learning are Q-learning [24,25], Sarsa [26,27] and Policy Gradients [28,29]. Among them, Q-learning is a prominent method in the field of control, which can reduce the risks and burdens caused by manual control.…”

Section: Reinforcement Learningmentioning

confidence: 99%

Traffic Light Cycle Configuration of Single Intersection Based on Modified Q-Learning

et al. 2019

View full text Add to dashboard Cite

In recent years, within large cities with a high population density, traffic congestion has become more and more serious, resulting in increased emissions of vehicles and reducing the efficiency of urban operations. Many factors have caused traffic congestion, such as insufficient road capacity, high vehicle density, poor urban traffic planning and inconsistent traffic light cycle configuration. Among these factors, the problems of traffic light cycle configuration are the focal points of this paper. If traffic lights can adjust the cycle dynamically with traffic data, it will reduce degrees of traffic congestion significantly. Therefore, a modified mechanism based on Q-Learning to optimize traffic light cycle configuration is proposed to obtain lower average vehicle delay time, while keeping significantly fewer processing steps. The experimental results will show that the number of processing steps of this proposed mechanism is 11.76 times fewer than that of the exhaustive search scheme, and also that the average vehicle delay is only slightly lower than that of the exhaustive search scheme by 5.4%. Therefore the proposed modified Q-learning mechanism will be capable of reducing the degrees of traffic congestions effectively by minimizing processing steps.

show abstract

“…Recently, control of UAVs (unmanned aerial vehicles) has been a challenging problem and an example of a task is that a robot needs to avoid collision with an enemy UAV in its flying path to the goal. Cheng et al [19] formulated this as a Markov decision process and applied temporal-difference reinforcement learning to the robot control. The learned policy can achieve a good performance to reach the goal without colliding with the enemy.…”

Section: Advanced Mobile Roboticsmentioning

confidence: 99%

Special Feature on Advanced Mobile Robotics

Kim

2019

Applied Sciences

View full text Add to dashboard Cite

show abstract

Automated Enemy Avoidance of Unmanned Aerial Vehicles Based on Reinforcement Learning

Cited by 15 publications

References 32 publications

Deep Reinforcement Learning With Application to Air Confrontation Intelligent Decision-Making of Manned/Unmanned Aerial Vehicle Cooperative System

Deep Reinforcement Learning With Application to Air Confrontation Intelligent Decision-Making of Manned/Unmanned Aerial Vehicle Cooperative System

Traffic Light Cycle Configuration of Single Intersection Based on Modified Q-Learning

Special Feature on Advanced Mobile Robotics

Contact Info

Product

Resources

About