2021
DOI: 10.48550/arxiv.2101.05982
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Abstract: Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple modelfree algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art modelbased algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(59 citation statements)
references
References 22 publications
0
59
0
Order By: Relevance
“…Moreover, we implement the critic's model ensemble as a single neural network, using linear non-fully-connected layers evenly splitting the nodes and dropping the weight connections between the splits. Practically, when evaluated under the same hardware, this results in our algorithm running more than two times faster than the implementation from Chen et al (2021) while having a similar algorithmic complexity.…”
Section: Methodsmentioning
confidence: 93%
See 4 more Smart Citations
“…Moreover, we implement the critic's model ensemble as a single neural network, using linear non-fully-connected layers evenly splitting the nodes and dropping the weight connections between the splits. Practically, when evaluated under the same hardware, this results in our algorithm running more than two times faster than the implementation from Chen et al (2021) while having a similar algorithmic complexity.…”
Section: Methodsmentioning
confidence: 93%
“…Moreover, Lan et al (2020) introduced a sampling procedure for the critic's ensemble predictions to regulate underestimation in the TD-targets. Their work was later extended to the continuous setting by Chen et al (2021), which showed that large ensembles combined with a high update-to-data ratio enable to outperform the sample efficiency of contemporary model-based methods. Ensembling has also been used to achieve better exploration following the principle of optimism in the face of uncertainty (Brafman & Tennenholtz, 2002) in both discrete (Osband et al, 2016;Chen et al, 2017) and continuous settings (Ciosek et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations