2022
DOI: 10.48550/arxiv.2206.10558
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Abstract: There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others aim to improve the system's overall throughput. In this paper, we try to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…Existing experience replay systems such as RLlib (Liang et al, 2018), stable-baseline (Hill et al, 2018), rlpyt (Stooke & Abbeel, 2019), tianshou (Weng et al, 2022a), sample factory (Petrenko et al, 2020), and envpool (Weng et al, 2022b) have been predominantly designed for relatively smaller RL models and their implementations are confined to singleserver contexts. Consequently, they are not equipped to handle the distributed trajectory storage, selection, and collection necessary for training large RL models.…”
Section: Limitations Of Existing Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Existing experience replay systems such as RLlib (Liang et al, 2018), stable-baseline (Hill et al, 2018), rlpyt (Stooke & Abbeel, 2019), tianshou (Weng et al, 2022a), sample factory (Petrenko et al, 2020), and envpool (Weng et al, 2022b) have been predominantly designed for relatively smaller RL models and their implementations are confined to singleserver contexts. Consequently, they are not equipped to handle the distributed trajectory storage, selection, and collection necessary for training large RL models.…”
Section: Limitations Of Existing Systemsmentioning
confidence: 99%
“…Existing experience replay systems, unfortunately, fall short in fully addressing the aforementioned challenges. Most of these systems, such as RLlib (Liang et al, 2018), RL-Zoo (Ding et al, 2021), stable-baselines (Hill et al, 2018), rlpyt (Stooke & Abbeel, 2019), tianshou (Weng et al, 2022a), TorchOpt-RL (Liu et al, 2022;Ren et al, 2022), sample factory (Petrenko et al, 2020), and envpool (Weng et al, 2022b), are incorporated as part of single-server RL frameworks and fail to offer distributed trajectory storage, selection, and collection. The recent development in distributed experience replay systems, exemplified by Reverb (Cassirer et al, 2021), allows for storing trajectories on memory-optimized servers.…”
Section: Introductionmentioning
confidence: 99%
“…It also suffers from reduced usability (e.g., difficult to add diverse assets) and functionality (e.g., object contacts are inaccessible). EnvPool (Weng et al, 2022) batches environments by a thread pool to minimize synchronization and improve CPU utilization. Yet its environments need to be implemented in C++, which hinders fast prototyping (e.g., customizing observations and rewards).…”
Section: Introductionmentioning
confidence: 99%