A Movement Adjustment Method for DQN-Based Autonomous Aerial Vehicle

Saito, Nobuki; Oda, Tetsuya; Hirata, Aoto; Toyoshima, Kyohei; Hirota, Masaharu; Barolli, Leonard

doi:10.1007/978-3-030-84910-8_15

Cited by 4 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nobuki et al [25] proposed a DQN algorithm based on a taboo list strategy (TLS-DQN) for indoor single-path environments. While this algorithm can find a path to the target point in simple indoor environments, it lacks the ability to guarantee effective path planning in complex environments due to simulations being limited to environments with fewer obstacles.…”

Section: A Algorithm Based On Dqnmentioning

confidence: 99%

“…Decoupled action selection and action evaluation with improved 1 accelerated convergence unable to adapt to complex environments DQN [24] (Gu et al,2022) TLS-DQN [25] ( Combine DDQN with ADQN [27] (Zhang et al,2022) DDQN based on greedy strategy [28] ( DDQN based on prior knowledge, integrated action mask method [29] (Yang et al,2022) DDQN based on Dynamic Compound Reward Function [30] 2 improved the path planning capability of the algorithm in complex environments the experience replay buffer structure reduces sampling efficiency MS-DDQN [31] (Peng et al,2021) 2 improved learning cannot adaptable to complex environments ECMS-DDQN [32] (…”

Section: B Algorithm Based On Ddqnmentioning

confidence: 99%

See 1 more Smart Citation

Path Planning for Outdoor Mobile Robots Based on IDDQN

Shuhai,

Shangjie,

Cun

2024

IEEE Access

View full text Add to dashboard Cite

Path planning is one of the research hotspots for outdoor mobile robots. This paper addresses the issues of slow convergence and low accuracy in the Double Deep Q Network (DDQN) method in environments with many obstacles in the context of deep reinforcement learning. A new algorithm, Improve Double Deep Q Network (IDDQN), is proposed, which utilizes second-order temporal difference methods and a binary tree data structure to improve the DDQN method. The improved method evaluates the actions of the current robot using second-order temporal difference methods and employs a binary tree structure to store the results obtained from these methods, replacing the traditional experience pool structure. The environment is constructed using a grid method, programmed in the Python language, with two twodimensional grid maps created for simple and complex environments. DDQN and four related deep reinforcement learning methods, such as Multi-step updates and Experience Classification Double Deep Q Network (ECMS-DDQN), are compared through simulation experiments with the IDDQN method. Simulation results indicate that the IDDQN method improves various path planning metrics compared to the DDQN method and other relevant reinforcement learning methods. In the simple environment, IDDQN method exhibits a 26.89% improvement in step convergence time, a 22.58% improvement in reward convergence time, and a 10.30% improvement in average reward value after convergence compared to the original DDQN algorithm. It also outperforms other simulated methods in the simple environment, although the difference is not significant. In the complex environment, the IDDQN method avoids falling into local optima compared to other methods, demonstrating the accuracy of its strategy in complex environments. Other methods show artificially high average reward values after converging in local optima, lacking reference value. In the complex environment, IDDQN method exhibits a 33.22% improvement in step convergence time and a 25.47% improvement in reward convergence time compared to the original DDQN algorithm, clearly surpassing other participating simulated methods. The data above indicate that the IDDQN method improves both convergence speed and accuracy compared to the DDQN method and the relevant improvement methods simulated in this paper. Particularly in environments with many obstacles, the performance improvement is evident, allowing for effective path planning in such environments.

show abstract

Section: A Algorithm Based On Dqnmentioning

confidence: 99%

Section: B Algorithm Based On Ddqnmentioning

confidence: 99%

Path Planning for Outdoor Mobile Robots Based on IDDQN

Shuhai,

Shangjie,

Cun

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…As an upgraded version of the DDPG algorithm, it enhances the performance and stability of the algorithm by incorporating a twin Q network and a delayed update strategy. Comparing to the DQN algorithm which requires the selection of an action with the maximum Q value from all actions, thus can only handle environments with finite action spaces [ 14 ], the TD3 algorithm can handle continuous control tasks. In comparison with the TRPO algorithm [ 15 ] and PPO algorithm [ 16 ], both are online policy algorithms with relatively low sample efficiency, but the TD3 algorithm is an offline policy algorithm with higher sample efficiency.…”

Section: Introductionmentioning

confidence: 99%