2022
DOI: 10.1155/2022/9017079
|View full text |Cite
|
Sign up to set email alerts
|

UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Abstract: Deep deterministic policy gradient (DDPG) algorithm is a reinforcement learning method, which has been widely used in UAV path planning. However, the critic network of DDPG is frequently updated in the training process. It leads to an inevitable overestimation problem and increases the training computational complexity. Therefore, this paper presents a multicritic-delayed DDPG method for solving the UAV path planning. It uses multicritic networks and delayed learning methods to reduce the overestimation proble… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 34 publications
0
9
0
Order By: Relevance
“…Our method is compared with the methods proposed by Li [28], Zhao [29] and Wu [30] for experiments. As shown in Figure 18, our proposed method has the highest average reward and the best training effect, followed by the method proposed by Zhao and Wu, whose training effects are roughly the same, and the method proposed by Li has the lowest average reward.…”
Section: Comparison and Analysis With Other Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our method is compared with the methods proposed by Li [28], Zhao [29] and Wu [30] for experiments. As shown in Figure 18, our proposed method has the highest average reward and the best training effect, followed by the method proposed by Zhao and Wu, whose training effects are roughly the same, and the method proposed by Li has the lowest average reward.…”
Section: Comparison and Analysis With Other Methodsmentioning
confidence: 99%
“…However, the convergence speed of the algorithm is not ideal, and it still needs to be improved. Reference [30] proposed a multi-critical delay DDPG method for UAV path planning. DDPG overestimation is reduced by using multi-critical networks and delayed learning methods and introducing noise to improve the robustness in the real environment.…”
Section: Introductionmentioning
confidence: 99%
“…Equation ( 5) has already been proved for its establishment [29,30]. Even the zero mean error of the initial state will lead to an overestimation of the action value due to the update of the value function, and the adverse efect of this error will be gradually enlarged by the calculation of the Bellman equation.…”
Section: Error Analysis It Is An Inevitable Problem For Q-mentioning
confidence: 99%
“…Specifically, to improve the network generalization, a deep V-network with an attention mechanism is designed in local RL planner and a multi-stage, multi-scenario training strategy is adopted in the training process. The contributions of this work are as follows: Unlike the literature [ 13 , 14 , 15 , 16 , 17 , 18 , 19 ], the method proposed in this paper can realize safe and fast path planning while satisfying the second-order kinematics model and constraints of fixed-wing UAVs; For the motion planning problem studied in this paper, a deep V-network based on the attention mechanism is adopted with a multi-stage, multi-scenario training strategy to improve training efficiency and network generalization. The effectiveness of the algorithm is verified by comparison simulation experiments.…”
Section: Introductionmentioning
confidence: 99%
“…Unlike the literature [ 13 , 14 , 15 , 16 , 17 , 18 , 19 ], the method proposed in this paper can realize safe and fast path planning while satisfying the second-order kinematics model and constraints of fixed-wing UAVs;…”
Section: Introductionmentioning
confidence: 99%