Reinforcement Learning Based Self-play and State Stacking Techniques for Noisy Air Combat Environment

Tasbas, Ahmet Semih; Sahin, Safa O.; Üre, Nazım Kemal

doi:10.2514/6.2023-1077

Cited by 2 publications

(1 citation statement)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these methods require complete observations of environments, and their perceptions are limited under partially observable conditions. An intuitive and straightforward way to improve perceptual ability is the state-stacking approach [29], in which a sequence of states is concatenated to improve representation. However, this technique tends to expand the state space and increase training difficulty.…”

Section: Introductionmentioning

confidence: 99%

Representation Enhancement-Based Proximal Policy Optimization for UAV Path Planning and Obstacle Avoidance

Huang,

Wang,

et al. 2023

International Journal of Aerospace Engineering

View full text Add to dashboard Cite

Path planning and obstacle avoidance are pivotal for intelligent unmanned aerial vehicle (UAV) systems in various domains, such as postdisaster rescue, target detection, and wildlife conservation. Currently, reinforcement learning (RL) has become increasingly popular in UAV decision-making. However, the RL approaches confront the challenges of partial observation and large state space when searching for random targets through continuous actions. This paper proposes a representation enhancement-based proximal policy optimization (RE-PPO) framework to address these issues. The representation enhancement (RE) module consists of observation memory improvement (OMI) and dynamic relative position-attitude reshaping (DRPAR). OMI reduces collision under partially observable conditions by separately extracting perception features and state features through an embedding network and feeding the extracted features to a gated recurrent unit (GRU) to enhance observation memory. DRPAR compresses the state space when modeling continuous actions by transforming movement trajectories of different episodes from an absolute coordinate system into different local coordinate systems to utilize similarity. In addition, three step-wise reward functions are formulated to avoid sparsity and facilitate model convergence. We evaluate the proposed method in three 3D scenarios to demonstrate its effectiveness. Compared to other methods, our method achieves a faster convergence during training and demonstrates a higher success rate and a lower rate of timeout and collision during inference. Our method can significantly enhance the autonomy and intelligence of UAV systems under partially observable conditions and provide a reasonable solution for UAV decision-making under uncertainties.

show abstract

Section: Introductionmentioning

confidence: 99%