2022
DOI: 10.1016/j.eswa.2021.116023
|View full text |Cite
|
Sign up to set email alerts
|

Prioritized Experience Replay based on Multi-armed Bandit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…Du et al [32] proposes a framework to refresh experiences in the replay buffer regarding current policy. Liu et al [33] propose a dynamic experience replay strategy that combines multiple priority-weighted criteria to measure the importance of experiences. Yang and Peng [34] propose the Meta-learning-based ER (MSER) to deal with the computational complexity and the need for careful hyperparameter adjustments in PER.…”
Section: Literature Review and Related Workmentioning
confidence: 99%
“…Du et al [32] proposes a framework to refresh experiences in the replay buffer regarding current policy. Liu et al [33] propose a dynamic experience replay strategy that combines multiple priority-weighted criteria to measure the importance of experiences. Yang and Peng [34] propose the Meta-learning-based ER (MSER) to deal with the computational complexity and the need for careful hyperparameter adjustments in PER.…”
Section: Literature Review and Related Workmentioning
confidence: 99%
“…Mattar & Daw [3] suggested that the reactivation of sequences of behaviourally-relevant experiences during quiet wakefulness and sleep for which the hippocampus is well known [35] is an expression of this sort of prioritized replay. They thereby explained a wealth of experimental findings on the selection of replay experiences in rodents [32,33] as well as humans [5,36].…”
Section: Gain and Needmentioning
confidence: 99%
“…Instead of approaching the sampling strategy, Du et al (2022) proposed a framework to refresh experiences by moving the agent back to past states, executing sequences of actions following its current policy, and storing and reusing new experiences from this process if it turned out better than what the agent previously experienced. (Liu et al, 2022) proposed a dynamic experience replay strategy based on Multarmed Bandit, which combines multiple priority weighted criteria to measure the importance of the experiences and adjust its weights from one episode to another. Yang and Peng (2021) proposed the Meta-learning-based Experience Replay (MSER) applied in DDPG to deal with the computational complexity in PER and its need for careful hyperparameter adjustments.…”
Section: Research On Experience Replay and Some Directions For Future...mentioning
confidence: 99%