A dual-memory architecture for reinforcement learning on neuromorphic platforms

Olin-Ammentorp, Wilkie; Sokolov, Yury; Bazhenov, Maxim

doi:10.1088/2634-4386/ac1a64

Cited by 4 publications

(3 citation statements)

References 35 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Equally, prioritized stochastic memory management (PSMM) [20], combined experience replay (CER) [37], selective experience replay (SER) [17], and episodic memory control (EMC) [40] use experience retention www.ijacsa.thesai.org strategies (memory management strategies). In contrast, some replay strategies focus on the structure of the replay memory instead of the content [35], [42], [43]. ERO has proven superior among prioritized selection algorithms, owing to its easy adaptation and generalization to multiple environments [23].…”

Section: Prioritized Sequence Experience Replay (Pser) [39] mentioning

confidence: 99%

Experience Replay Optimization via ESMM for Stable Deep Reinforcement Learning

Osei,

Lopez

2024

IJACSA

View full text Add to dashboard Cite

The memorization and reuse of experience, popularly known as experience replay (ER), has improved the performance of off-policy deep reinforcement learning (DRL) algorithms such as deep Q-networks (DQN) and deep deterministic policy gradients (DDPG). Despite its success, ER faces the challenges of noisy transitions, large memory sizes, and unstable returns. Researchers have introduced replay mechanisms focusing on experience selection strategies to address these issues. However, the choice of experience retention strategy has a significant influence on the selection strategy. Experience Replay Optimization (ERO) is a novel reinforcement learning algorithm that uses a deep replay policy for experience selection. However, ERO relies on the naïve first-in-first-out (FIFO) retention strategy, which seeks to manage replay memory by constantly retaining recent experiences irrespective of their relevance to the agent's learning. FIFO sequentially overwrites the oldest experience with a new one when the replay memory is full. To improve the retention strategy of ERO, we propose an experience replay optimization with enhanced sequential memory management (ERO-ESMM). ERO-ESMM uses an improved sequential retention strategy to manage the replay memory efficiently and stabilize the performance of the DRL agent. The efficacy of the ESMM strategy is evaluated together with five additional retention strategies across four distinct OpenAI environments. The experimental results indicate that ESMM performs better than the other five fundamental retention strategies.

show abstract

Section: Prioritized Sequence Experience Replay (Pser) [39] mentioning

confidence: 99%

Experience Replay Optimization via ESMM for Stable Deep Reinforcement Learning

Osei,

Lopez

2024

IJACSA

View full text Add to dashboard Cite

show abstract

“…Just as multiple neural networks are used in some RL algorithms to enhance training stability [8,11,15,26,30,31], recent works in ER are exploring the use of a dual-memory architecture. The dualism may come in the form of long and short memory [32] or main and cache [33]. It may also be differentiated based on the sources of the replay data or the ratio of selection from the dual memory.…”

Section: Experience Selection Strategies and Algorithmsmentioning

confidence: 99%

“…It may also be differentiated based on the sources of the replay data or the ratio of selection from the dual memory. Olin-Ammentorp et al [32] rely on the complementary learning system of the human brain (interaction between the cortical and hippocampal networks) to design a dual memory (short-term and long-term) replay architecture. However, their design was implemented in a discrete state-action space.…”

Section: Experience Selection Strategies and Algorithmsmentioning

confidence: 99%

Experience Replay Optimisation via ATSC and TSC for Performance Stability in Deep RL

Osei

Lopez

2023

Applied Sciences

View full text Add to dashboard Cite

Catastrophic forgetting is a significant challenge in deep reinforcement learning (RL). To address this problem, researchers introduce the experience replay (ER) concept to complement the training of a deep RL agent. However, the buffer size, experience selection, and experience retention strategies adopted for the ER can negatively affect the agent’s performance stability, especially for complex continuous state action problems. This paper investigates how to address the stability problem using an enhanced ER method that combines a replay policy network, a dual memory, and an alternating transition selection control (ATSC) mechanism. Two frameworks were designed: an experience replay optimisation via alternating transition selection control (ERO-ATSC) without a transition storage control (TSC) and an ERO-ATSC with a TSC. The first is a hybrid of experience replay optimisation (ERO) and dual-memory experience replay (DER) and the second, which has two versions of its kind, integrates a transition storage control (TSC) into the first framework. After comprehensive experimental evaluations of the frameworks on the pendulum-v0 environment and across multiple buffer sizes, retention strategies, and sampling ratios, the reward version of ERO-ATSC with a TSC exhibits superior performance over the first framework and other novel methods, such as the deep deterministic policy gradient (DDPG) and ERO.

show abstract