UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Wu, Runjia; Gu, Fangqing; Liu, Hailin; Shi, Hongcan

doi:10.1155/2022/9017079

Cited by 8 publications

(9 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our method is compared with the methods proposed by Li [28], Zhao [29] and Wu [30] for experiments. As shown in Figure 18, our proposed method has the highest average reward and the best training effect, followed by the method proposed by Zhao and Wu, whose training effects are roughly the same, and the method proposed by Li has the lowest average reward.…”

Section: Comparison and Analysis With Other Methodsmentioning

confidence: 99%

“…However, the convergence speed of the algorithm is not ideal, and it still needs to be improved. Reference [30] proposed a multi-critical delay DDPG method for UAV path planning. DDPG overestimation is reduced by using multi-critical networks and delayed learning methods and introducing noise to improve the robustness in the real environment.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Intelligent path planning of mobile robot based on Deep Deterministic Policy Gradient

Gong

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep Deterministic Policy Gradient (DDPG) is a deep reinforcement learning algorithm that is widely used in the path planning of mobile robots. It solves the continuous action space problem and can ensure the continuity of mobile robot motion using the Actor-Critic framework, which has great potential in the field of mobile robot path planning. However, because the Critic network always selects the maximum Q value to evaluate the actions of mobile robot, there is the problem of inaccurate Q value estimation. In addition, DDPG adopts a random uniform sampling method, which can’t efficiently use the more important sample data, resulting in slow convergence speed during the training of the path planning model and easily falling into local optimum. In this paper, a dueling network is introduced based on DDPG to improve the estimation accuracy of the Q value, and the reward function is optimized to increase the immediate reward, to direct the mobile robot to move faster toward the target point. To further improve the efficiency of experience replay, a single experience pool is separated into two by comprehensively considering the influence of average reward and TD-error on the importance of samples, and a dynamic adaptive sampling mechanism is adopted to sample the two experience pools separately. Finally, experiments were carried out in the simulation environment created with the ROS system and the Gazebo platform. The results of the experiments show that the proposed path planning algorithm has a fast convergence speed and high stability, and the success rate can reach 100% and 93% in the environment without obstacles and with obstacles, respectively.

show abstract

Section: Comparison and Analysis With Other Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Intelligent path planning of mobile robot based on Deep Deterministic Policy Gradient

Gong

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Equation ( 5) has already been proved for its establishment [29,30]. Even the zero mean error of the initial state will lead to an overestimation of the action value due to the update of the value function, and the adverse efect of this error will be gradually enlarged by the calculation of the Bellman equation.…”

Section: Error Analysis It Is An Inevitable Problem For Q-mentioning

confidence: 99%

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Zhang¹,

Zhang

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability of local optimal phenomenon; then, introduce the idea of dual-actor network to alleviate the underestimation of value generated by dual-evaluator network, and select the action with the greatest value in the two-actor networks to update to stabilize the training of the algorithm process. Finally, the improved method is validated on four continuous action tasks provided by MuJoCo, and the results show that the improved method can reduce the fluctuation range of error and improve the cumulative return compared with the classical algorithm.

show abstract

“…Specifically, to improve the network generalization, a deep V-network with an attention mechanism is designed in local RL planner and a multi-stage, multi-scenario training strategy is adopted in the training process. The contributions of this work are as follows: Unlike the literature [ 13 , 14 , 15 , 16 , 17 , 18 , 19 ], the method proposed in this paper can realize safe and fast path planning while satisfying the second-order kinematics model and constraints of fixed-wing UAVs; For the motion planning problem studied in this paper, a deep V-network based on the attention mechanism is adopted with a multi-stage, multi-scenario training strategy to improve training efficiency and network generalization. The effectiveness of the algorithm is verified by comparison simulation experiments.…”

Section: Introductionmentioning

confidence: 99%

“…Unlike the literature [ 13 , 14 , 15 , 16 , 17 , 18 , 19 ], the method proposed in this paper can realize safe and fast path planning while satisfying the second-order kinematics model and constraints of fixed-wing UAVs;…”

Section: Introductionmentioning

confidence: 99%

Globally Guided Deep V-Network-Based Motion Planning Algorithm for Fixed-Wing Unmanned Aerial Vehicles

Du,

You,

Zhao

2024

Sensors

View full text Add to dashboard Cite

Fixed-wing UAVs have shown great potential in both military and civilian applications. However, achieving safe and collision-free flight in complex obstacle environments is still a challenging problem. This paper proposed a hierarchical two-layer fixed-wing UAV motion planning algorithm based on a global planner and a local reinforcement learning (RL) planner in the presence of static obstacles and other UAVs. Considering the kinematic constraints, a global planner is designed to provide reference guidance for ego-UAV with respect to static obstacles. On this basis, a local RL planner is designed to accomplish kino-dynamic feasible and collision-free motion planning that incorporates dynamic obstacles within the sensing range. Finally, in the simulation training phase, a multi-stage, multi-scenario training strategy is adopted, and the simulation experimental results show that the performance of the proposed algorithm is significantly better than that of the baseline method.

show abstract

UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Cited by 8 publications

References 34 publications

Intelligent path planning of mobile robot based on Deep Deterministic Policy Gradient

Intelligent path planning of mobile robot based on Deep Deterministic Policy Gradient

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Globally Guided Deep V-Network-Based Motion Planning Algorithm for Fixed-Wing Unmanned Aerial Vehicles

Contact Info

Product

Resources

About