Dyna-Q is a reinforcement learning method widely used in AGV path planning. However, in large complex dynamic environments, due to the sparse reward function of Dyna-Q and the large searching space, this method has the problems of low search efficiency, slow convergence speed, and even inability to converge, which seriously reduces the performance and practicability of it. To solve these problems, this paper proposes an Improved Dyna-Q algorithm for AGV path planning in large complex dynamic environments. First, to solve the problem of the large search space, this paper proposes a global path guidance mechanism based on heuristic graph, which can effectively reduce the path search space and, thus, improve the efficiency of obtaining the optimal path. Second, to solve the problem of the sparse reward function in Dyna-Q, this paper proposes a novel dynamic reward function and an action selection method based on the heuristic graph, which can provide more intensive feedback and more efficient action decision for AGV path planning, effectively improving the convergence of the algorithm. We evaluated our approach in scenarios with static obstacles and dynamic obstacles. The experimental results show that the proposed algorithm can obtain better paths more efficiently than other reinforcement-learning-based methods including the classical Q-Learning and the Dyna-Q algorithms.
Motion planning is one of the important research topics of robotics. As an improvement of Rapidly exploring Random Tree (RRT), the RRT* motion planning algorithm is widely used because of its asymptotic optimality. However, the running time of RRT* increases rapidly with the number of potential path vertices, resulting in slow convergence or even an inability to converge, which seriously reduces the performance and practical value of RRT*. To solve this issue, this paper proposes a two-phase motion planning algorithm named Metropolis RRT* (M-RRT*) based on the Metropolis acceptance criterion. First, to efficiently obtain the initial path and start the optimal path search phase earlier, an asymptotic vertex acceptance criterion is defined in the initial path estimation phase of M-RRT*. Second, to improve the convergence rate of the algorithm, a nonlinear dynamic vertex acceptance criterion is defined in the optimal path search phase, which preferentially accepts vertices that may improve the current path. The effectiveness of M-RRT* is verified by comparing it with existing algorithms through the simulation results in three test environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.