Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.
In this paper, an intelligent algorithm integrating model predictive control and Standoff algorithm is proposed to solve trajectory planning that UAVs may face while tracking a moving target cooperatively in a complex three-dimensional environment. A fusion model using model predictive control and Standoff algorithm is thus constructed to ensure trajectory planning and formation maintenance, maximizing UAV sensors’ detection range while minimizing target loss probability. Meanwhile, with this model, a fully connected communication topology is used to complete the UAV communication, multi-UAV formation can be reconfigured and planned at the minimum cost, keeping off deficiency in avoiding real-time obstacles facing the Standoff algorithm. Simulation validation suggests that the fusion algorithm proves to be more capable of maintaining UAVs in stable formation and detecting the target, compared with the model predictive control algorithm alone, in the process of tracking the moving target in a complex 3D environment.
Aiming at addressing the problem of manoeuvring decision-making in UAV air combat, this study establishes a one-to-one air combat model, defines missile attack areas, and uses the non-deterministic policy Soft-Actor-Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm (PSP-SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.
The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial. Unmanned aerial vehicles inability to manoeuvre autonomously during air combat that features highly dynamic and uncertain manoeuvres of the enemy; however, limits their combat capabilities, which proves to be very challenging. To meet the challenge, this article proposes an autonomous manoeuvre decision model using an expert actor‐based soft actor critic algorithm that reconstructs empirical replay buffer with expert experience. Specifically, the algorithm uses a small amount of expert experience to increase the diversity of the samples, which can largely improve the exploration and utilisation efficiency of deep reinforcement learning. And to simulate the complex battlefield environment, a one‐to‐one air combat model is established and the concept of missile's attack region is introduced. The model enables the one‐to‐one air combat to be simulated under different initial battlefield situations. Simulation results show that the expert actor‐based soft actor critic algorithm can find the most favourable policy for unmanned aerial vehicles to defeat the opponent faster, and converge more quickly, compared with the soft actor critic algorithm.
Aiming at an intelligent perception and obstacle avoidance of UAV in an environment, a UAV visual flight control method based on deep reinforcement learning is proposed in this paper. The method employs Gate Recurrent Unit (GRU) to the UAV flight control decision network, and uses Deep Deterministic Policy Gradient (DDPG), a deep reinforcement learning algorithm to train the network. The special gates structure of GRU is utilized to memorize historical information, and acquire the variation law of the environment of UAV from the time series data including image information of obstacles, UAV position and speed information to realize a dynamic perception of obstacles. Moreover, the basic framework and training method of the network are introduced, and the generalization ability of the network is tested. The experimental results show that the proposed method has better generalization ability and better adaptability to the environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.