2021
DOI: 10.2514/1.i010961
|View full text |Cite
|
Sign up to set email alerts
|

Cooperative Planning for an Unmanned Combat Aerial Vehicle Fleet Using Reinforcement Learning

Abstract: In this study, reinforcement learning (RL)-based centralized path planning is performed for an unmanned combat aerial vehicle (UCAV) fleet in a human-made hostile environment. The proposed method provides a novel approach in which closing speed and approximate time-to-go terms are used in the reward function to obtain cooperative motion while ensuring no-fly-zones (NFZs) and time-of-arrival constraints. Proximal policy optimization (PPO) algorithm is used in the training phase of the RL agent. System performan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…The usage of the PPO algorithm in short-range air combat The second improvement point is to differentiate the neural network inputs. For the basic PPO algorithm, the critic network uses the same input as the actor network, which is named state space [20] . However, the actor and critic networks play different roles in the algorithm.…”
Section: Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…The usage of the PPO algorithm in short-range air combat The second improvement point is to differentiate the neural network inputs. For the basic PPO algorithm, the critic network uses the same input as the actor network, which is named state space [20] . However, the actor and critic networks play different roles in the algorithm.…”
Section: Algorithmmentioning
confidence: 99%
“…The actor network gets action according to the relationship in the view of the red, but the critic network gets the evaluation value based on the state of the air combat environment, which can include the information that cannot be observed. Thus, the input for the critic network 𝑠 π‘π‘Ÿπ‘–π‘‘π‘–π‘ is defined as [17] 𝑠 π‘π‘Ÿπ‘–π‘‘π‘–π‘ = [𝐷, πœ‘ π‘Ÿ , πœ‘ 𝑏 , 𝑧 𝑏 βˆ’ 𝑧 π‘Ÿ , πœ“ 𝐷 , πœ“ π‘π‘Ÿ , πœƒ π‘π‘Ÿ , 𝑣 π‘Ÿ βˆ’ 𝑣 𝑏 , 𝐡 π‘Ÿ , 𝐡 π‘Ÿπ‘ ] 𝑇 , (20) where subscript π‘Ÿπ‘ means the red' s value minus the blue' s and π‘π‘Ÿ is on the contrary, and 𝐡 π‘Ÿ is the red' s residual blood. The critic network' s state variables are also normalized and the threshold vector 𝛿 π‘π‘Ÿπ‘–π‘‘π‘–π‘ is selected as…”
Section: The Critic Network' S State Spacementioning
confidence: 99%
“…Accordingly, optimization and learning techniques are employed in their applications to increase the number of UAV types and extend their operation range. In [1], centralized path planning based on reinforcement learning is implemented in a combat aerial vehicle fleet in order to avoid enemy defense systems. To localize radio frequency emitting targets in the operation area, multiple UAVs were deployed in [2] using Particle Filter and Extended Kalman Filter algorithms and vision-based detection.…”
Section: Introductionmentioning
confidence: 99%
“…We found that entropy shrinks to a small value prematurely during the training process, resulting in insufficient exploration in policy learning, therefore obtaining a defective policy. In the face of this problem, we then introduce entropy regularization [31] to Equation (3).…”
mentioning
confidence: 99%