2021 IEEE International Conference on Multimedia and Expo (ICME) 2021
DOI: 10.1109/icme51207.2021.9428188
|View full text |Cite
|
Sign up to set email alerts
|

Ddper: Decentralized Distributed Prioritized Experience Replay

Abstract: In off-policy reinforcement learning, prioritized experience replay plays an important role. However, the centralized prioritized experience replay becomes the bottleneck for efficient training. We propose to approximate the centralized prioritized experience replay in a distributed and decentralized way under certain mild assumptions. To be specific, each actor stores samples in its local replay in the same way as prioritized experience replay, the learner fetches a batch of samples from these replays followi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…As an improvement method for sending more useful experiences, Liu et al [24] propose a distributed reinforcement learning architecture in which both actors and learner have a replay memory. The experience sampling is done twice in a hierarchical manner in both actors and learner.…”
Section: Improvement Methods Of Transfer Efficiency Of Distributed Re...mentioning
confidence: 99%
See 1 more Smart Citation
“…As an improvement method for sending more useful experiences, Liu et al [24] propose a distributed reinforcement learning architecture in which both actors and learner have a replay memory. The experience sampling is done twice in a hierarchical manner in both actors and learner.…”
Section: Improvement Methods Of Transfer Efficiency Of Distributed Re...mentioning
confidence: 99%
“…To address the heterogeneity of various computing platforms, it is expected that multiple buffer nodes are implemented on multiple edge servers; in this case, the number of experiences sampled from each buffer node should be tuned depending on the experience generation rate of each buffer node (e.g., more experiences should be sampled from a buffer node that produces more experiences). As the number of the buffer and actor nodes increases, the learner node would be a performance bottleneck; such a scalability issue can be addressed by introducing a hierarchical structure as in [24], and this direction is our future work.…”
Section: B Effectiveness and Scalability Of Proposed Approach For Rea...mentioning
confidence: 99%