2021
DOI: 10.48550/arxiv.2105.07253
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 7 publications
0
0
0
Order By: Relevance
“…Kumar et al [56] believe that an algorithm's performance will be affected when the optimization objective of a sampling strategy is different from that of an agent. Liu et al [57] verified this conclusion. To solve this problem, they proposed two new algorithms that consider large TD error experience based on maximizing the Q value.…”
Section: Prioritized Experience Replaymentioning
confidence: 75%
See 1 more Smart Citation
“…Kumar et al [56] believe that an algorithm's performance will be affected when the optimization objective of a sampling strategy is different from that of an agent. Liu et al [57] verified this conclusion. To solve this problem, they proposed two new algorithms that consider large TD error experience based on maximizing the Q value.…”
Section: Prioritized Experience Replaymentioning
confidence: 75%
“…It achieves better results on MuJoCo. Kumar et al [56,57] estimated Q value more accurately and judged experience's importance, reducing the instability brought by a network itself. Kumar et al [56] believe that an algorithm's performance will be affected when the optimization objective of a sampling strategy is different from that of an agent.…”
Section: Prioritized Experience Replaymentioning
confidence: 99%