Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Liu, Xuhui; Xue, Zhenghai; Pang, Jing-Cheng; Jiang, Shengyi; Xu, Feng; Yu, Yang

doi:10.48550/arxiv.2105.07253

Cited by 1 publication

(2 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kumar et al [56] believe that an algorithm's performance will be affected when the optimization objective of a sampling strategy is different from that of an agent. Liu et al [57] verified this conclusion. To solve this problem, they proposed two new algorithms that consider large TD error experience based on maximizing the Q value.…”

Section: Prioritized Experience Replaymentioning

confidence: 75%

“…It achieves better results on MuJoCo. Kumar et al [56,57] estimated Q value more accurately and judged experience's importance, reducing the instability brought by a network itself. Kumar et al [56] believe that an algorithm's performance will be affected when the optimization objective of a sampling strategy is different from that of an agent.…”

Section: Prioritized Experience Replaymentioning

confidence: 99%

See 1 more Smart Citation

Task Scheduling Based on Adaptive Priority Experience Replay on Cloud Platforms

Gao

Shi

et al. 2023

Electronics

View full text Add to dashboard Cite

Task scheduling algorithms based on reinforce learning (RL) have been important methods with which to improve the performance of cloud platforms; however, due to the dynamics and complexity of the cloud environment, the action space has a very high dimension. This not only makes agent training difficult but also affects scheduling performance. In order to guide an agent’s behavior and reduce the number of episodes by using historical records, a task scheduling algorithm based on adaptive priority experience replay (APER) is proposed. APER uses performance metrics as scheduling and sampling optimization objectives with which to improve network accuracy. Combined with prioritized experience replay (PER), an agent can decide how to use experiences. Moreover, this algorithm also considers whether a subtask is executed in a workflow to improve scheduling efficiency. Experimental results on Tpc-h, Alibaba cluster data, and scientific workflows show that a model with APER has significant benefits in terms of convergence and performance.

show abstract

Section: Prioritized Experience Replaymentioning

confidence: 75%