2021
DOI: 10.3390/sym13061061
|View full text |Cite
|
Sign up to set email alerts
|

Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving

Abstract: The research on autonomous driving based on deep reinforcement learning algorithms is a research hotspot. Traditional autonomous driving requires human involvement, and the autonomous driving algorithms based on supervised learning must be trained in advance using human experience. To deal with autonomous driving problems, this paper proposes an improved end-to-end deep deterministic policy gradient (DDPG) algorithm based on the convolutional block attention mechanism, and it is called multi-input attention pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…The SumTree binary tree structure is employed to store samples in the prioritized experience replay buffer [28]. The samples with larger absolute TD-errors are more likely to be sampled, leading to faster convergence of the algorithm.…”
Section: Prioritized Experience Replaymentioning
confidence: 99%
See 1 more Smart Citation
“…The SumTree binary tree structure is employed to store samples in the prioritized experience replay buffer [28]. The samples with larger absolute TD-errors are more likely to be sampled, leading to faster convergence of the algorithm.…”
Section: Prioritized Experience Replaymentioning
confidence: 99%
“…Therefore, based on the absolute value of the TD-error t δ for each sample, the priority of that sample is proportional to t δ . The SumTree binary tree structure is employed to store samples in the prioritized experience replay buffer [28]. The samples with larger absolute TD-errors are more likely to be sampled, leading to faster convergence of the algorithm.…”
Section: Prioritized Experience Replaymentioning
confidence: 99%
“…The researchers [ 29 ] established in their study that the policy’s performance is represented by this deterministic policy gradient. As seen in Fig 3 , the critic network determines the next state s’ of the actor network by assessing the state-action (s , a) pair value performance in order to maximize Q-value [ 30 ]. The output Q(s , a | Q) of the critic network is used to compute the definition of the critic loss function L., where N is minibatch size sample from the replay buffer, i -index refer the i -th sample, and y i is the temporal differencetarget.…”
Section: Background Theorymentioning
confidence: 99%