As an important algorithm of artificial intelligence technology, Q-learning algorithm plays a significant part in a number of fields, such as driverless technology, industrial automation, health care, intelligent search, game, etc. As a classical learning algorithm in reinforcement learning, with the help of an experienced action sequence in a Markov environment, an agent can learn to select the best course of action using the model-free learning technique known as Q-learning. This paper mainly discusses the addition of received signal strength (RSS) to the Q-learning algorithm to navigate unmanned aerial vehicle (UAV), summarizes the main content and results of the neural Q-learning algorithm helping UAV avoid obstacles, adaptive and random exploration (ARE) method is proposed to address the issue in UAV planning a route tasks, summarizes the content and results of route designing of moving robot using obstacle characteristics as Q-learning states and actions, the Q-learning algorithm employs a novel exploration technique that combines Ξ΅-rapacious exploring with Boltzmann theory. to help mobile robot to plan its path, and analyzes the convergence speed of the algorithm for planning a route of stage Q-learning and the path planning algorithm of traditional Q-learning. When there are many states and actions, the operation efficiency of the Q-learning algorithm will be greatly reduced, so it is necessary to study in depth how to reduce the operation time of the algorithm for Q-learning and increase the convergence velocity of algorithm for Q-learning.