2021
DOI: 10.1155/2021/5169460
|View full text |Cite
|
Sign up to set email alerts
|

Research on Dynamic Path Planning of Mobile Robot Based on Improved DDPG Algorithm

Abstract: Aiming at the problems of low success rate and slow learning speed of the DDPG algorithm in path planning of a mobile robot in a dynamic environment, an improved DDPG algorithm is designed. In this article, the RAdam algorithm is used to replace the neural network optimizer in DDPG, combined with the curiosity algorithm to improve the success rate and convergence speed. Based on the improved algorithm, priority experience replay is added, and transfer learning is introduced to improve the training effect. Thro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 20 publications
0
15
0
Order By: Relevance
“…The information in the environment (obstacle location, shape, and orientation) is completely unknown before the mobile robot performs a path-planning task, and only the starting and goal points are known. Mobile robots need to plan the shortest path length from the starting point to the goal point without colliding with obstacles in the environment [ 30 ].…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…The information in the environment (obstacle location, shape, and orientation) is completely unknown before the mobile robot performs a path-planning task, and only the starting and goal points are known. Mobile robots need to plan the shortest path length from the starting point to the goal point without colliding with obstacles in the environment [ 30 ].…”
Section: Preliminariesmentioning
confidence: 99%
“… Q-learning requires a certain memory to store the tracking Q-table. When MR has m states and n actions, the dimension of the constituted Q table is m∗n , and by choosing the maximum Q value to determine the next move's direction, a total of m∗ ( n − 1) times need to be compared, with the more complex state space and actions, which will exponentially increase the amount of computation, resulting in long computation time [ 30 ]. It will lead to local optimum when the environment is complex, and it is easy to fall into a dead-end path blocked by the obstacles.…”
Section: Preliminariesmentioning
confidence: 99%
“…In order to fully evaluate the performance of the prioritized experience replay algorithm proposed in this paper, it is integrated into DDPG and compared with the algorithms proposed by Cicek [23] , Xu [25] , Cao [26] , and Li [27] , respectively. In the simulation environment shown in Fig.…”
Section: Comparison With Other Algorithmsmentioning
confidence: 99%
“…Cao et al [26] integrated TD-error, Q-value and data volume, focused on different importance indicators in different training stages of the neural network, and dynamically adjusted the weight of each indicator to achieve an adaptive experience importance estimation. Li et al [27] introduced internal curiosity module (ICM) to provide internal rewards for the training process of the robot, which were combined with external rewards provided by environmental feedback, and then introduced prioritized experience replay and transfer learning to improve the success rate and convergence…”
Section: Introductionmentioning
confidence: 99%
“…In this method, a mixed noise along with a more reasonable reward function was used for quick training. The proposed algorithm and DDPG-based algorithms [ 23 , 24 ] were experimentally compared and the advantages of the proposed algorithm were demonstrated in a complex environment in terms of exploration efficiency, optimum path and time. In Zhou et al [ 7 ], an improved DQN algorithm was proposed for the path planning problem of patrolling robots.…”
Section: Introductionmentioning
confidence: 99%