Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence 2017
DOI: 10.24963/ijcai.2017/483
|View full text |Cite
|
Sign up to set email alerts
|

Weighted Double Q-learning

Abstract: Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Qlearning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator deter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 78 publications
(36 citation statements)
references
References 7 publications
0
35
0
1
Order By: Relevance
“…We plan to introduce the ensemble Q-network [ 23 , 37 , 38 ] or weighted Q estimates [ 39 , 40 ] to reduce the biases of the estimated Q-values and improve the stability of our algorithm. Another important aspect would be improving the exploration ability, which will further enhance the performance of our algorithm in the face of a stronger opponent bot.…”
Section: Discussionmentioning
confidence: 99%
“…We plan to introduce the ensemble Q-network [ 23 , 37 , 38 ] or weighted Q estimates [ 39 , 40 ] to reduce the biases of the estimated Q-values and improve the stability of our algorithm. Another important aspect would be improving the exploration ability, which will further enhance the performance of our algorithm in the face of a stronger opponent bot.…”
Section: Discussionmentioning
confidence: 99%
“…However, it is not trivial to integrate Constrained DQN with DDQN and its extension called Weighted Double Q learning (Zhang et al, 2017), because in these methods the target network was used to decompose the max operation into action selection and action evaluation. To reduce the problem of overestimation, the mellowmax operator (Kim et al, 2019) is promising, which is a variant of Soft Q learning.…”
Section: Discussionmentioning
confidence: 99%
“…It is apparent that the probability of selecting a state-action pair is increased when an ErrP is absent following 800 ms of occurrence of an ERD/ERS. The proposed work on probabilistic reinforcement learning (PRL), is compared with two variants of Double Q-Learning namely, DQL1 [40] and DQL2 [41], Rainbow Algorithm [42], and Deep Reinforcement Learning (DRL) [43]. Fig.…”
Section: F Classificationmentioning
confidence: 99%