Cooperative Planning for an Unmanned Combat Aerial Vehicle Fleet Using Reinforcement Learning

Yüksek, Burak; Demirezen, Mustafa Umut; Tsourdos, Antonios; Tsourdos, Antonios

doi:10.2514/1.i010961

Cited by 11 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The usage of the PPO algorithm in short-range air combat The second improvement point is to differentiate the neural network inputs. For the basic PPO algorithm, the critic network uses the same input as the actor network, which is named state space [20] . However, the actor and critic networks play different roles in the algorithm.…”

Section: Algorithmmentioning

confidence: 99%

“…The actor network gets action according to the relationship in the view of the red, but the critic network gets the evaluation value based on the state of the air combat environment, which can include the information that cannot be observed. Thus, the input for the critic network 𝑠 𝑐𝑟𝑖𝑡𝑖𝑐 is defined as [17] 𝑠 𝑐𝑟𝑖𝑡𝑖𝑐 = [𝐷, 𝜑 𝑟 , 𝜑 𝑏 , 𝑧 𝑏 − 𝑧 𝑟 , 𝜓 𝐷 , 𝜓 𝑏𝑟 , 𝜃 𝑏𝑟 , 𝑣 𝑟 − 𝑣 𝑏 , 𝐵 𝑟 , 𝐵 𝑟𝑏 ] 𝑇 , (20) where subscript 𝑟𝑏 means the red' s value minus the blue' s and 𝑏𝑟 is on the contrary, and 𝐵 𝑟 is the red' s residual blood. The critic network' s state variables are also normalized and the threshold vector 𝛿 𝑐𝑟𝑖𝑡𝑖𝑐 is selected as…”

Section: The Critic Network' S State Spacementioning

confidence: 99%

See 1 more Smart Citation

UAV maneuver decision-making via deep reinforcement learning for short-range air combat

Zhang¹,

Duan²

2023

Intell Robot

View full text Add to dashboard Cite

The unmanned aerial vehicle (UAV) has been applied in unmanned air combat because of its flexibility and practicality. The short-range air combat situation is rapidly changing, and the UAV has to make the autonomous maneuver decision as quickly as possible. In this paper, a type of short-range air combat maneuver decision method based on deep reinforcement learning is proposed. Firstly, the combat environment, including UAV motion model and the position and velocity relationships, is described. On this basic, the combat process is established. Secondly, some improved points based on proximal policy optimization (PPO) are proposed to enhance the maneuver decision-making ability. The gate recurrent unit (GRU) can help PPO make decisions with continuous timestep data. The actor network's input is the observation of UAV, however, the input of the critic network, named state, includes the blood values which cannot be observed directly. In addition, the action space with 15 basic actions and well-designed reward function are proposed to combine the air combat environment and PPO. In particular, the reward function is divided into dense reward, event reward and end-game reward to ensure the training feasibility. The training process is composed of three phases to shorten the training time. Finally, the designed maneuver decision method is verified through the ablation study and confrontment tests. The results show that the UAV with the proposed maneuver decision method can obtain an effective action policy to make a more flexible decision in air combat.

show abstract

Section: Algorithmmentioning

confidence: 99%

Section: The Critic Network' S State Spacementioning

confidence: 99%

UAV maneuver decision-making via deep reinforcement learning for short-range air combat

Zhang¹,

Duan²

2023

Intell Robot

View full text Add to dashboard Cite

show abstract

“…Accordingly, optimization and learning techniques are employed in their applications to increase the number of UAV types and extend their operation range. In [1], centralized path planning based on reinforcement learning is implemented in a combat aerial vehicle fleet in order to avoid enemy defense systems. To localize radio frequency emitting targets in the operation area, multiple UAVs were deployed in [2] using Particle Filter and Extended Kalman Filter algorithms and vision-based detection.…”

Section: Introductionmentioning

confidence: 99%

Safe Motion Planning and Learning for Unmanned Aerial Systems

Perk

Tsourdos

2022

Aerospace

Self Cite

View full text Add to dashboard Cite

To control unmanned aerial systems, we rarely have a perfect system model. Safe and aggressive planning is also challenging for nonlinear and under-actuated systems. Expert pilots, however, demonstrate maneuvers that are deemed at the edge of plane envelope. Inspired by biological systems, in this paper, we introduce a framework that leverages methods in the field of control theory and reinforcement learning to generate feasible, possibly aggressive, trajectories. For the control policies, Dynamic Movement Primitives (DMPs) imitate pilot-induced primitives, and DMPs are combined in parallel to generate trajectories to reach original or different goal points. The stability properties of DMPs and their overall systems are analyzed using contraction theory. For reinforcement learning, Policy Improvement with Path Integrals (PI2) was used for the maneuvers. The results in this paper show that PI2 updated policies are a feasible and parallel combination of different updated primitives transfer the learning in the contraction regions. Our proposed methodology can be used to imitate, reshape, and improve feasible, possibly aggressive, maneuvers. In addition, we can exploit trajectories generated by optimization methods, such as Model Predictive Control (MPC), and a library of maneuvers can be instantly generated. For application, 3-DOF (degrees of freedom) Helicopter and 2D-UAV (unmanned aerial vehicle) models are utilized to demonstrate the main results.

show abstract

“…We found that entropy shrinks to a small value prematurely during the training process, resulting in insufficient exploration in policy learning, therefore obtaining a defective policy. In the face of this problem, we then introduce entropy regularization [31] to Equation (3).…”

mentioning

confidence: 99%

Real-Time On-the-Fly Motion Planning for Urban Air Mobility via Updating Tree Data of Sampling-Based Algorithms Using Neural Network Inference

Lou,

Yuksek,

Inalhan

et al. 2024

Aerospace

View full text Add to dashboard Cite

In this study, we consider the problem of motion planning for urban air mobility applications to generate a minimal snap trajectory and trajectory that cost minimal time to reach a goal location in the presence of dynamic geo-fences and uncertainties in the urban airspace. We have developed two separate approaches for this problem because designing an algorithm individually for each objective yields better performance. The first approach that we propose is a decoupled method that includes designing a policy network based on a recurrent neural network for a reinforcement learning algorithm, and then combining an online trajectory generation algorithm to obtain the minimal snap trajectory for the vehicle. Additionally, in the second approach, we propose a coupled method using a generative adversarial imitation learning algorithm for training a recurrent-neural-network-based policy network and generating the time-optimized trajectory. The simulation results show that our approaches have a short computation time when compared to other algorithms with similar performance while guaranteeing sufficient exploration of the environment. In urban air mobility operations, our approaches are able to provide real-time on-the-fly motion re-planning for vehicles, and the re-planned trajectories maintain continuity for the executed trajectory. To the best of our knowledge, we propose one of the first approaches enabling one to perform an on-the-fly update of the final landing position and to optimize the path and trajectory in real-time while keeping explorations in the environment.

show abstract

Cooperative Planning for an Unmanned Combat Aerial Vehicle Fleet Using Reinforcement Learning

Cited by 11 publications

References 20 publications

UAV maneuver decision-making via deep reinforcement learning for short-range air combat

UAV maneuver decision-making via deep reinforcement learning for short-range air combat

Safe Motion Planning and Learning for Unmanned Aerial Systems

Real-Time On-the-Fly Motion Planning for Urban Air Mobility via Updating Tree Data of Sampling-Based Algorithms Using Neural Network Inference

Contact Info

Product

Resources

About