2021
DOI: 10.1007/978-3-030-84910-8_15
|View full text |Cite
|
Sign up to set email alerts
|

A Movement Adjustment Method for DQN-Based Autonomous Aerial Vehicle

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…Nobuki et al [25] proposed a DQN algorithm based on a taboo list strategy (TLS-DQN) for indoor single-path environments. While this algorithm can find a path to the target point in simple indoor environments, it lacks the ability to guarantee effective path planning in complex environments due to simulations being limited to environments with fewer obstacles.…”
Section: A Algorithm Based On Dqnmentioning
confidence: 99%
See 1 more Smart Citation
“…Nobuki et al [25] proposed a DQN algorithm based on a taboo list strategy (TLS-DQN) for indoor single-path environments. While this algorithm can find a path to the target point in simple indoor environments, it lacks the ability to guarantee effective path planning in complex environments due to simulations being limited to environments with fewer obstacles.…”
Section: A Algorithm Based On Dqnmentioning
confidence: 99%
“…Decoupled action selection and action evaluation with improved 1 accelerated convergence unable to adapt to complex environments DQN [24] (Gu et al,2022) TLS-DQN [25] ( Combine DDQN with ADQN [27] (Zhang et al,2022) DDQN based on greedy strategy [28] ( DDQN based on prior knowledge, integrated action mask method [29] (Yang et al,2022) DDQN based on Dynamic Compound Reward Function [30] 2 improved the path planning capability of the algorithm in complex environments the experience replay buffer structure reduces sampling efficiency MS-DDQN [31] (Peng et al,2021) 2 improved learning cannot adaptable to complex environments ECMS-DDQN [32] (…”
Section: B Algorithm Based On Ddqnmentioning
confidence: 99%
“…As an upgraded version of the DDPG algorithm, it enhances the performance and stability of the algorithm by incorporating a twin Q network and a delayed update strategy. Comparing to the DQN algorithm which requires the selection of an action with the maximum Q value from all actions, thus can only handle environments with finite action spaces [ 14 ], the TD3 algorithm can handle continuous control tasks. In comparison with the TRPO algorithm [ 15 ] and PPO algorithm [ 16 ], both are online policy algorithms with relatively low sample efficiency, but the TD3 algorithm is an offline policy algorithm with higher sample efficiency.…”
Section: Introductionmentioning
confidence: 99%