Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/589
|View full text |Cite
|
Sign up to set email alerts
|

Experience Replay Optimization

Abstract: Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize a rulebased replay strategy, which may be sub-optimal. In this work, we consider learning a replay policy to optimize the cumulative reward. Replay learning is challenging because the replay memory is noisy and large, and the cumulative reward is unstable. To address these … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
61
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 71 publications
(62 citation statements)
references
References 0 publications
1
61
0
Order By: Relevance
“…Compared to the original PER, UCB reached a better final policy. Recent research called Experience Replay Optimization (ERO) which dynamically adjusted the sampling policy to adapt variety tasks [14]. The ERO used the transition's reward, the TD-error, and the current timestep to estimate the priority of data.…”
Section: Priority Experience Replaymentioning
confidence: 99%
See 4 more Smart Citations
“…Compared to the original PER, UCB reached a better final policy. Recent research called Experience Replay Optimization (ERO) which dynamically adjusted the sampling policy to adapt variety tasks [14]. The ERO used the transition's reward, the TD-error, and the current timestep to estimate the priority of data.…”
Section: Priority Experience Replaymentioning
confidence: 99%
“…We guess that it is because we only use TD-error as a priority. From the analysis of ERO [14], the TD-error is not the best priority metric; sometimes the TD-error metrics will harm the training process. All the hypotheses need to be verified in our future work.…”
Section: Gamementioning
confidence: 99%
See 3 more Smart Citations