Episodic Memory Deep Q-Networks

Lin, Zichuan; Yang, Guangwen; Zhang, Lintao

doi:10.24963/ijcai.2018/337

Cited by 50 publications

(37 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It learns faster than Double DQN or N-step DQN in the Atari game Pong, but it impairs the advantage of generalization of DQN and lacks the continuous and effective use of episodic memory. Recently, the author in [26] combines parametric module of DQN with nonparametric module of episodic control with the purpose of improving both sample efficiency as well as module generalization. This EMDQN method is better than DQN and surpasses both MFEC and NEC.…”

Section: B Incorporation Of Episodic Memorymentioning

confidence: 99%

“…The parameters of EMDQN and DQN are all set the same as [26]. As for HE-EMDQN, the networks and basic hyperparameter settings are set as DQN.…”

Section: A Experimental Setupmentioning

confidence: 99%

“…For example, MFEC [24] and NEC [25] propose different episodic memory modules for efficient decision making, instead of parametric neural networks. Others like EMDQN [26], EVA [27], EBU [28], etc., involve episodic memory to their algorithms to make them more powerful as well as more efficient.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Sample Efficient Reinforcement Learning Method via High Efficient Episodic Memory

Yang

Qin

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Reinforcement Learning (RL), especially Deep Reinforcement Learning (DRL), has made great progress in many areas, such as robots, video games and driving. However, sample inefficiency is a big obstacle to the widespread practical application of DRL. Inspired by the decision making in human brain, this problem can be solved by incorporating instance based learning, i.e. episodic memory. Many episodic memory based RL algorithms have emerged recently. However, these algorithms either only replace parametric DRL algorithm with episodic control or incorporate episodic memory in a single component of DRL. In contrast to preview works, this paper proposes a new sample-efficient reinforcement learning architecture which introduces a new episodic memory module and incorporates episodic thought into some key components of DRL: exploration, experience replay and loss function. Taking Deep Q-Network (DQN) algorithm for example, when combined with DQN, our algorithm is called High Efficient Episodic Memory DQN (HE-EMDQN). In HE-EMDQN, a new non-parametric episodic memory module is introduced to help calculate the loss and modify the predicted value for exploration. For the sake of accelerating the sample learning in experience replay, an auxiliary small buffer called percentile best episode replay memory is designed to compose a mixed mini-batch. We show across the testing environments that our algorithm is significantly more powerful and sample-efficient than DQN and the recent episodic memory deep q-network (EMDQN). This work provides a new perspective for other RL algorithms to improve sample efficiency by utilising episodic memory efficiently.

show abstract

Section: B Incorporation Of Episodic Memorymentioning

confidence: 99%

“…The parameters of EMDQN and DQN are all set the same as [26]. As for HE-EMDQN, the networks and basic hyperparameter settings are set as DQN.…”

Section: A Experimental Setupmentioning

confidence: 99%

See 1 more Smart Citation

Sample Efficient Reinforcement Learning Method via High Efficient Episodic Memory

Yang

Qin

et al. 2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…One of such approaches is known as memory consolidation or system-level consolidation (McClelland et al, 1995): an episodic memory system maintains a subset of previously experienced sensorimotor data and replays them, along with the new samples, to the networks during the training. Episodic memory system has been integrated recently also in the deep learning systems, such as in Deep Q-Networks implementing deep reinforcement learning (RL; Lin et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces

et al. 2020

View full text Add to dashboard Cite

This work presents an architecture that generates curiosity-driven goal-directed exploration behaviours for an image sensor of a microfarming robot. A combination of deep neural networks for offline unsupervised learning of low-dimensional features from images and of online learning of shallow neural networks representing the inverse and forward kinematics of the system have been used. The artificial curiosity system assigns interest values to a set of pre-defined goals and drives the exploration towards those that are expected to maximise the learning progress. We propose the integration of an episodic memory in intrinsic motivation systems to face catastrophic forgetting issues, typically experienced when performing online updates of artificial neural networks. Our results show that adopting an episodic memory system not only prevents the computational models from quickly forgetting knowledge that has been previously acquired but also provides new avenues for modulating the balance between plasticity and stability of the models.

show abstract

“…ADQN can reduce estimation bias, but needs more networks, occupies more storage resources and affects calculating efficiency. Lin et al [34] combined episodic control with DQN, and proposed episodic memory deep Q-networks (EMDQN), which leverages episodic memory to supervise an agent during training. In EMDQN, it requires fewer interactions with the environment, generates better sample efficiency, and can also alleviate overestimation of DQN.…”

Section: Introductionmentioning

confidence: 99%

Stochastic Double Deep Q-Network

Wang

Cheng

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Estimation bias seriously affects the performance of reinforcement learning algorithms. The maximum operation may result in overestimation, while the double estimator operation often leads to underestimation. To eliminate the estimation bias, these two operations are combined together in our proposed algorithm named stochastic double deep Q-learning network (SDDQN), which is based on the idea of random selection. A tabular version of SDDQN is also given, named stochastic double Q-learning (SDQ). Both the SDDQN and SDQ are based on the double estimator framework. At each step, we choose to use either the maximum operation or the double estimator operation with a certain probability, which is determined by a random selection parameter. The theoretical analysis shows that there indeed exists a proper random selection parameter that makes SDDQN and SDQ unbiased. The experiments on Grid World and Atari 2600 games illustrate that our proposed algorithms can balance the estimation bias effectively and improve performance.

show abstract

Episodic Memory Deep Q-Networks

Cited by 50 publications

References 0 publications

Sample Efficient Reinforcement Learning Method via High Efficient Episodic Memory

Sample Efficient Reinforcement Learning Method via High Efficient Episodic Memory

Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces

Stochastic Double Deep Q-Network

Contact Info

Product

Resources

About