Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning

Zhou, Xiaomao; Bai, Tao; Gao, Yanbin; Han, Yuntao

doi:10.3390/s19071576

Cited by 22 publications

(13 citation statements)

References 60 publications

(77 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reinforcement learning is a method of machine learning, its essence is to find an optimal decision through continuous interaction with the environment [30]. The idea of reinforcement learning is as follows: The agent affects the environment by performing actions.…”

Section: Q-learning Optimization Algorithmmentioning

confidence: 99%

Indoor Emergency Path Planning Based on the Q-Learning Optimization Algorithm

Gu²,

et al. 2022

IJGI

View full text Add to dashboard Cite

The internal structure of buildings is becoming increasingly complex. Providing a scientific and reasonable evacuation route for trapped persons in a complex indoor environment is important for reducing casualties and property losses. In emergency and disaster relief environments, indoor path planning has great uncertainty and higher safety requirements. Q-learning is a value-based reinforcement learning algorithm that can complete path planning tasks through autonomous learning without establishing mathematical models and environmental maps. Therefore, we propose an indoor emergency path planning method based on the Q-learning optimization algorithm. First, a grid environment model is established. The discount rate of the exploration factor is used to optimize the Q-learning algorithm, and the exploration factor in the ε-greedy strategy is dynamically adjusted before selecting random actions to accelerate the convergence of the Q-learning algorithm in a large-scale grid environment. An indoor emergency path planning experiment based on the Q-learning optimization algorithm was carried out using simulated data and real indoor environment data. The proposed Q-learning optimization algorithm basically converges after 500 iterative learning rounds, which is nearly 2000 rounds higher than the convergence rate of the Q-learning algorithm. The SASRA algorithm has no obvious convergence trend in 5000 iterations of learning. The results show that the proposed Q-learning optimization algorithm is superior to the SARSA algorithm and the classic Q-learning algorithm in terms of solving time and convergence speed when planning the shortest path in a grid environment. The convergence speed of the proposed Q- learning optimization algorithm is approximately five times faster than that of the classic Q- learning algorithm. The proposed Q-learning optimization algorithm in the grid environment can successfully plan the shortest path to avoid obstacle areas in a short time.

show abstract

Section: Q-learning Optimization Algorithmmentioning

confidence: 99%

Indoor Emergency Path Planning Based on the Q-Learning Optimization Algorithm

Gu²,

et al. 2022

IJGI

View full text Add to dashboard Cite

show abstract

“…In recent years, with the rapid development of artificial intelligence and deep reinforcement learning (DRL) technology, learning-based path planning, obstacle detection, trafficability analysis and other technologies have been widely concerned by researchers [ 13 , 14 , 15 ]. DRL has the advantages of not requiring environmental maps, strong learning capabilities, and high dynamic adaptability.…”

Section: Introductionmentioning

confidence: 99%

Learning-Based End-to-End Path Planning for Lunar Rovers with Safety Constraints

Wang

Zhang

2021

Sensors

View full text Add to dashboard Cite

Path planning is an essential technology for lunar rover to achieve safe and efficient autonomous exploration mission, this paper proposes a learning-based end-to-end path planning algorithm for lunar rovers with safety constraints. Firstly, a training environment integrating real lunar surface terrain data was built using the Gazebo simulation environment and a lunar rover simulator was created in it to simulate the real lunar surface environment and the lunar rover system. Then an end-to-end path planning algorithm based on deep reinforcement learning method is designed, including state space, action space, network structure, reward function considering slip behavior, and training method based on proximal policy optimization. In addition, to improve the generalization ability to different lunar surface topography and different scale environments, a variety of training scenarios were set up to train the network model using the idea of curriculum learning. The simulation results show that the proposed planning algorithm can successfully achieve the end-to-end path planning of the lunar rover, and the path generated by the proposed algorithm has a higher safety guarantee compared with the classical path planning algorithm.

show abstract

“…Reinforcement learning (RL), one of methodologies of machine learning, is used to describe and solve how an intelligent agent learns and optimizes the strategy during the interaction with the environment [1]. To be more specific, the intelligent agent acquires the reinforcement signal (reward feedback) from the environment during the continuous interaction with the environment, and adjusts its own action strategy through the reward feedback, aiming at the maximum gain.…”

Section: Introductionmentioning

confidence: 99%

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

et al. 2021

View full text Add to dashboard Cite

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

show abstract

Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning

Cited by 22 publications

References 60 publications

Indoor Emergency Path Planning Based on the Q-Learning Optimization Algorithm

Indoor Emergency Path Planning Based on the Q-Learning Optimization Algorithm

Learning-Based End-to-End Path Planning for Lunar Rovers with Safety Constraints

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Contact Info

Product

Resources

About