A novel DDPG method with prioritized experience replay

Hou, Yue; Liu, Lifeng; Wei, Qing; Xu, Xudong; Chen, Chunlin

doi:10.1109/smc.2017.8122622

Cited by 185 publications

(81 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Traditional RL samples the experiences from replay buffer with equal importance, which ignores the difference in the value of each experience. Therefore, to improve the learning efficiency and avoid local optima, we adopt the technique of prioritized experience replay in [30]. In prioritized experience replay, the DRL agent draws experiences from replay buffer with weights proportional to their TD-error.…”

Section: ) Prioritized Experience Replaymentioning

confidence: 99%

A Hybrid Learning Framework for Service Function Chaining Across Geo-Distributed Data Centers

Tang

2020

IEEE Access

View full text Add to dashboard Cite

Service function chaining (SFC) focuses mainly on deploying various network functions in geographically distributed data centers and providing interconnect routing among them. Traditional (convex optimization-based) SFC algorithms exhibit some disadvantages on the scalability and accuracy. Recently, researches have shown the effectiveness of deep reinforcement learning (DRL) in the field of SFC. However, current DRL-based algorithms possess an extremely large action space, which leads to slow convergence and poor scalability. Some researchers relieve this issue by reformulating the SFC problem, which usually results in low utilization and high cost. To address this issue, we develop a hybrid DRL-based framework which decouples the VNF deployment and flow routing into different modules. In the proposed framework, a DRL agent is only responsible for learning the policy of VNF deployment. We customize the structure of the agent base on deep deterministic policy gradient (DDPG) and adopt several techniques to improve the learning efficiency, such as adaptive parameter noise, wolpertinger policy, and prioritized experience replay. The flow routing is conducted in a game-based module (GBM). We design a decentralized routing algorithm for the GBM to address the scalability. The end-to-end latency of flows is minimized while the resource capacity and location constraints are satisfied. During the learning process of the proposed framework, the DRL agent improves its deployment policy with the reward from the GBM (the value of reward depends on flow routing). Thus, the VNF deployment and flow routing are still jointly optimized. Compared to existing DRL-learning algorithms, the proposed hybrid DRL framework can achieve a lower cost since 1) the action space is significantly reduced due to flow routing decoupling; 2) the flow routing procedure is more efficient (the GBM adopts model-based information, e.g., the gradient). Through trace-driven simulations, we show the efficiency of our algorithm compared to existing DRL-based algorithms. INDEX TERMS service function chaining; deep reinforcement learning; game theory; cloud computing

show abstract

Section: ) Prioritized Experience Replaymentioning

confidence: 99%

A Hybrid Learning Framework for Service Function Chaining Across Geo-Distributed Data Centers

Tang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…TD-error is an effective metric of priority and it is easy to obtain in DQNs. Not only does the value-based algorithm use PER, but researchers also try PER on policy-based algorithms, such as DDPG + PER [12]. The metrics used in priority are the same as DQN because DDPG also needs to optimize TD loss.…”

Section: Priority Experience Replaymentioning

confidence: 99%

“…In this work, we use an improved DQN named Double DQN (DDQN) [22] and DDPG [12]. In DDQN, the definition of target value y shown in Equation 3, which solves the over-optimistic value estimation problem.…”

Section: Preliminarymentioning

confidence: 99%

“…From the definition of expectation, It is easy to prove the correctness of Equation (11). According to Equation (6), the importance weighted loss is defined by Equation (12). Figure 5 illustrates the update process.…”

Section: Execution Optimizationmentioning

confidence: 99%

“…It makes the agent focus on valuable data and speeds up the training process. Subsequently, many works have been proposed to improve DRL based on PER, such as Distributed PER [11], DDPG + PER [12], twice sampling PER [13], ERO [14], ReF-ER [15], and Rainbow [16].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Self-Adaptive Priority Correction for Prioritized Experience Replay

et al. 2020

View full text Add to dashboard Cite

Deep Reinforcement Learning (DRL) is a promising approach for general artificial intelligence. However, most DRL methods suffer from the problem of data inefficiency. To alleviate this problem, DeepMind proposed Prioritized Experience Replay (PER). Though PER improves data utilization, the priorities of most samples in its Experience Memory (EM) are out of date, as only the priorities of a small part of the data are updated while the Q network parameters are updated. Consequently, the difference between storage and real priority distributions gradually increases, which will introduce bias into the gradients of Deep Q-Learning (DQL) and make the DQL update toward a non-ideal direction. In this work, we propose a novel self-adaptive priority correction algorithm named Importance-PER (Imp-PER) to fix the update deviation. Specifically, we predict the sum of real Temporal-Difference error (TD-error) of all data in EM. Data are corrected by an importance weight, which is estimated by the predicted sum and the real TD-error calculated by the latest agent. To control the unbounded importance weight, we use truncated importance sampling with a self-adaptive truncation threshold. The conducted experiments on various games of Atari 2600 with Double Deep Q-Network and MuJoCo with Deep Deterministic Policy Gradient demonstrate that Imp-PER improves the data utilization and final policy quality on discrete states and continuous states tasks without increasing the computational cost.

show abstract

A double‐layer crowd evacuation simulation method based on deep reinforcement learning

Zhang,

Yang,

Zhu

2024

Computer Animation & Virtual

View full text Add to dashboard Cite

Existing crowd evacuation simulation methods commonly face challenges of low efficiency in path planning and insufficient realism in pedestrian movement during the evacuation process. In this study, we propose a novel crowd evacuation path planning approach based on the learning curve–deep deterministic policy gradient (LC‐DDPG) algorithm. The algorithm incorporates dynamic experience pool and a priority experience sampling strategy, enhancing convergence speed and achieving higher average rewards, thus efficiently enabling global path planning. Building upon this foundation, we introduce a double‐layer method for crowd evacuation using deep reinforcement learning. Specifically, within each group, individuals are categorized into leaders and followers. At the top layer, we employ the LC‐DDPG algorithm to perform global path planning for the leaders. Simultaneously, at the bottom layer, an enhanced social force model guides the followers to avoid obstacles and follow the leaders during evacuation. We implemented a crowd evacuation simulation platform. Experimental results show that our proposed method has high path planning efficiency and can generate more realistic pedestrian trajectories in different scenarios and crowd sizes.

show abstract

A novel DDPG method with prioritized experience replay

Cited by 185 publications

References 11 publications

A Hybrid Learning Framework for Service Function Chaining Across Geo-Distributed Data Centers

A Hybrid Learning Framework for Service Function Chaining Across Geo-Distributed Data Centers

Self-Adaptive Priority Correction for Prioritized Experience Replay

A double‐layer crowd evacuation simulation method based on deep reinforcement learning

Contact Info

Product

Resources

About