2021
DOI: 10.48550/arxiv.2102.04881
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Measuring Progress in Deep Reinforcement Learning Sample Efficiency

Abstract: Sampled environment transitions are a critical input to deep reinforcement learning (DRL) algorithms. Current DRL benchmarks often allow for the cheap and easy generation of large amounts of samples such that perceived progress in DRL does not necessarily correspond to improved sample efficiency. As simulating real world processes is often prohibitively hard and collecting real world experience is costly, sample efficiency is an important indicator for economically relevant applications of DRL. We investigate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 66 publications
0
5
0
Order By: Relevance
“…Sample efficiency (SE) (Chen et al, 2021 ; Dorner, 2021 ) is measured by the ratio of the number of samples collected when RAC and some algorithms reach the specified performance. Hopper is not in the comparison object as the performance of algorithms is almost indistinguishable.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Sample efficiency (SE) (Chen et al, 2021 ; Dorner, 2021 ) is measured by the ratio of the number of samples collected when RAC and some algorithms reach the specified performance. Hopper is not in the comparison object as the performance of algorithms is almost indistinguishable.…”
Section: Methodsmentioning
confidence: 99%
“…Results of MBPO are obtained at 3 × 10 5 time steps for Ant, Humanoid, and Walker2d, 4 × 10 5 for HalfCheetah and 1.25 × 10 5 for Hopper. Sample efficiency (Chen et al, 2021;Dorner, 2021) is measured by the ratio of the number of samples collected when RAC and some algorithms reach the specified performance. The last four rows show how many times RAC is more sample efficient than other algorithms in achieving that performance.…”
Section: Setupsmentioning
confidence: 99%
See 1 more Smart Citation
“…As an example, the value approximation function was replaced in Ref. [82] with the Kanerva-based function approximation. Although performance improvement was evidenced, the algorithms were still not suitable for large scale networks.…”
Section: Multi-agent Reinforcement Learning Approach For Joint Sensin...mentioning
confidence: 99%
“…We implement RAC with SAC and TD3 as RAC-SAC and RAC-TD3(see more details in Appendix B). We compare to state-of-the-art algorithms: SAC [17], TD3 [15], MBPO [20], REDQ [10] and [10,12] is measure by the ratio of the number of samples collected when RAC and some algorithm reach the specified performance. The last 4 rows show how many times REDQ are more sample efficient than other algorithms in reaching that performance.…”
Section: Setupsmentioning
confidence: 99%