2019
DOI: 10.1016/j.knosys.2019.03.018
|View full text |Cite
|
Sign up to set email alerts
|

A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 45 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…As a solution to this, deep reinforcement learning (DRL), which combines deep learning and reinforcement learning, is considered to be an effective alternative. For example, multistep learning- Deep Q-learning Network (DQN) [ 24 ] proposed the concept of using multilayered compensation after a one-step bootstrap when calculating the target Q value. If Q-learning is performed in advance by using the reward information after an n-step bootstrap, it is expected that the amount of computation required for learning can be greatly reduced.…”
Section: Discussionmentioning
confidence: 99%
“…As a solution to this, deep reinforcement learning (DRL), which combines deep learning and reinforcement learning, is considered to be an effective alternative. For example, multistep learning- Deep Q-learning Network (DQN) [ 24 ] proposed the concept of using multilayered compensation after a one-step bootstrap when calculating the target Q value. If Q-learning is performed in advance by using the reward information after an n-step bootstrap, it is expected that the amount of computation required for learning can be greatly reduced.…”
Section: Discussionmentioning
confidence: 99%
“…However, this strategy introduces an element of estimating bias. To minimize variance while maintaining a little bias, Schulman et al [28] proposed a generalized advantage function to solve this weakness as follows: To take it a step further, Schulman et al [29] then introduced a technique known as trust region policy optimization (TRPO). To broaden the application to large-scale state space DRL tasks, the TRPO method parameterizes the strategy using deep neural networks and achieves end-toend control using only the original input image.…”
Section: Deep Reinforcement Learning Based On the Policy Gradient For...mentioning
confidence: 99%
“…Additionally, a secondary target network is introduced to address issues of oscillation and divergence during the learning process. Despite the DQN's success in diverse applications and its widespread use in autonomous navigation [14,15], it has some drawbacks, notably action value overestimation derived from Q-learning updates. This overestimation arises because the action with the highest value in the Q-network is usually selected in the next state, and the same Q-network is used to select actions and calculate action values.…”
Section: Introductionmentioning
confidence: 99%