In order to solve the problem of slow convergence speed and long planned path when the robot plans a path in unknown environment by using Q-learning algorithm, we propose the Experience-Memory Q-Learning (EMQL) algorithm based on the continuous update of the shortest distance from the current state node to the start point. The autonomous learning ability of the robot is enhanced by the different role assignments of two tables in the proposed algorithm. EM table with (m * 1) dimension is designed to record the distance information, reflecting the learning process of the robot. Q table is adopted as an auxiliary guidance for the experience transfer strategy and experience reuse strategy, and these strategies enable the robot accomplish the task even if the destination is changed or the path is blocked. Further, the learning efficiency of the robot in the EMQL algorithm is improved by the dual reward mechanism consisting of static reward and dynamic reward. The static reward is designed to prevent the robot from exploring a state node excessively. The dynamic reward is responsible for helping the robot avoid searching blindly in unknown environment. We test the effectiveness of the proposed algorithm on both grid maps and road network maps. The comparison results in planning time, iteration times and path length show that the performance of the EMQL algorithm is superior to Q-learning algorithm in convergence speed and optimization ability. Additionally, the practicability of the proposed algorithm is validated in a real-world experiment using the Turtlebot3 burger robot.
INDEX TERMSPath planning, Q-learning, experience memory, experience transfer, experience reuse. HUI LU (Senior Member, IEEE) received the Ph.D. degree in navigation, guidance and control from Harbin Engineering University, Harbin, China, in 2004. She is currently a Professor with Beihang University and a member of the Shaanxi Key Laboratory of Integrated and Intelligent Navigation. Her research interests include information and communication systems, intelligent optimization, and fault diagnosis and prediction.