Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

Jin, Yanliang; Liu, Qianhong; Shen, Liquan; Zhu, Likun

doi:10.3390/sym13061061

Cited by 5 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The SumTree binary tree structure is employed to store samples in the prioritized experience replay buffer [28]. The samples with larger absolute TD-errors are more likely to be sampled, leading to faster convergence of the algorithm.…”

Section: Prioritized Experience Replaymentioning

confidence: 99%

“…Therefore, based on the absolute value of the TD-error t δ for each sample, the priority of that sample is proportional to t δ . The SumTree binary tree structure is employed to store samples in the prioritized experience replay buffer [28]. The samples with larger absolute TD-errors are more likely to be sampled, leading to faster convergence of the algorithm.…”

Section: Prioritized Experience Replaymentioning

confidence: 99%

See 1 more Smart Citation

Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards

Zhang,

2024

Entropy

View full text Add to dashboard Cite

In unstructured environments, robots need to deal with a wide variety of objects with diverse shapes, and often, the instances of these objects are unknown. Traditional methods rely on training with large-scale labeled data, but in environments with continuous and high-dimensional state spaces, the data become sparse, leading to weak generalization ability of the trained models when transferred to real-world applications. To address this challenge, we present an innovative maximum entropy Deep Q-Network (ME-DQN), which leverages an attention mechanism. The framework solves complex and sparse reward tasks through probabilistic reasoning while eliminating the trouble of adjusting hyper-parameters. This approach aims to merge the robust feature extraction capabilities of Fully Convolutional Networks (FCNs) with the efficient feature selection of the attention mechanism across diverse task scenarios. By integrating an advantage function with the reasoning and decision-making of deep reinforcement learning, ME-DQN propels the frontier of robotic grasping and expands the boundaries of intelligent perception and grasping decision-making in unstructured environments. Our simulations demonstrate a remarkable grasping success rate of 91.6%, while maintaining excellent generalization performance in the real world.

show abstract

Section: Prioritized Experience Replaymentioning

confidence: 99%

Section: Prioritized Experience Replaymentioning

confidence: 99%

Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards

Zhang,

2024

Entropy

View full text Add to dashboard Cite

show abstract

“…The researchers [ 29 ] established in their study that the policy’s performance is represented by this deterministic policy gradient. As seen in Fig 3 , the critic network determines the next state s’ of the actor network by assessing the state-action (s , a) pair value performance in order to maximize Q-value [ 30 ]. The output Q(s , a | Q) of the critic network is used to compute the definition of the critic loss function L., where N is minibatch size sample from the replay buffer, i -index refer the i -th sample, and y i is the temporal differencetarget.…”

Section: Background Theorymentioning

confidence: 99%

Enhancing the landing guidance of a reusable launch vehicle by improving genetic algorithm-based deep reinforcement learning using Hybrid Deterministic-Stochastic algorithm

Nugroho,

Andiarti,

Akmeliawati

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

The PbGA-DDPG algorithm, which uses a potential-based GA-optimized reward shaping function, is a versatiledeep reinforcement learning/DRLagent that can control a vehicle in a complex environment without prior knowledge. However, when compared to an established deterministic controller, it consistently falls short in terms of landing distance accuracy. To address this issue, the HYDESTOC Hybrid Deterministic-Stochastic (a combination of DDPG/deep deterministic policy gradient and PID/proportional-integral-derivative) algorithm was introduced to improve terminal distance accuracy while keeping propellant consumption low. Results from extensive cross-validated Monte Carlo simulations show that a miss distance of less than 0.02 meters, landing speed of less than 0.4 m/s, settling time of 20 seconds or fewer, and a constant crash-free performance is achievable using this method.

show abstract

A survey of autonomous driving frameworks and simulators

Zhao,

Meng,

et al. 2024

Advanced Engineering Informatics

View full text Add to dashboard Cite

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

Cited by 5 publications

References 25 publications

Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards

Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards

Enhancing the landing guidance of a reusable launch vehicle by improving genetic algorithm-based deep reinforcement learning using Hybrid Deterministic-Stochastic algorithm

A survey of autonomous driving frameworks and simulators

Contact Info

Product

Resources

About