Deep Q-Learning with Prioritized Sampling

Zhai, Jianwei; Liu, Quan; Zhang, Zongzhang; Zhong, Shuncong; Zhu, Haihong; Zhang, Peng; Sun, Cijia

doi:10.1007/978-3-319-46687-3_2

Cited by 23 publications

(15 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using only uniform sampling as a way to store experiences in the replay memory proved to have limitations such as that some of the valuable experiences might never be replayed [5]. Attention-based replay memory keeps the uniform sampling and extends it by additionally sampling the experiences that emerged from a specific type of interaction.…”

Section: Model Architecture and Learning Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

Attention-Based Experience Replay in Deep Q-Learning

Ramicic

Bonarini

2017

Proceedings of the 9th International Conference on Machine Learning and Computing

View full text Add to dashboard Cite

Using neural networks as function approximators in temporal difference reinforcement problems proved to be very effective in dealing with high-dimensionality of input state space, especially in more recent developments such as Deep Q-learning. These approaches share the use of a mechanism, called experience replay, that uniformly samples the previous experiences to a memory buffer to exploit them to re-learn, thus improving the efficiency of the learning process. In order to increase the learning performance, techniques such as prioritized experience and prioritized sampling have been introduced to deal with storing and replaying, respectively, the transitions with larger TD error. In this paper, we present a concept, called Attention-Based Experience REplay (ABERE), concerned with selective focusing of the replay buffer to specific types of experiences, therefore modeling the behavioral characteristics of the learning agent in a single and multi-agent environment. We further explore how different behavioral characteristics influence the performance of agents faced with dynamic environment that is able to become more hostile or benevolent by changing the relative probability to get positive or negative reinforcement.

show abstract

Section: Model Architecture and Learning Algorithmmentioning

confidence: 99%

“…Previous approaches have dealt with the dynamics of the replay memory mechanism in order to improve the speed of learning by focusing on the transitions that had a larger TD error in both experience sampling [5] and experience replay [4], but none was concerned about modifying the characteristics of the learning process itself.…”

Section: Introductionmentioning

confidence: 99%

Attention-Based Experience Replay in Deep Q-Learning

Ramicic

Bonarini

2017

Proceedings of the 9th International Conference on Machine Learning and Computing

View full text Add to dashboard Cite

show abstract

“…Recent implementations, such as [9], [10], include a memory buffer called replay memory that is functionally similar to the human working memory: it selectively stores the experiences, or transitions, in order to replay and re-learn from them off-line, therefore reducing the amount of data that should be acquired by expensive processes, and ensuring at the same time a more stable training of the approximator needed to manage continuous variables. Later approaches that dealt with the mechanism of replay memory were interested in improving the speed of learning by focusing the attention on specific transitions that are more valuable to the learning process, using criteria such as temporal difference error [11], received reinforcement [12], and information potential of the state [13]. Usually, in machine learning, the agent prefers unexpected experiences as they are more likely to "surprise" the predictor and feed the learning process in order to further reduce the uncertainty about the environment [14], [15].…”

Section: Introductionmentioning

confidence: 99%

Selective Perception as a Mechanism to Adapt Agents to the Environment: An Evolutionary Approach

Ramicic

Bonarini

2020

IEEE Trans. Cogn. Dev. Syst.

View full text Add to dashboard Cite

Rapid advancement of machine learning makes it possible to consider large amounts of data to learn from. Learning agents may get data ranging on real intervals directly from the environment they interact with, in a process usually time-expensive. To improve learning and manage these data, approximated models and memory mechanisms are adopted. In most of the implementations of reinforcement learning facing this type of data, approximation is obtained by neural networks and the process of drawing information from data is mediated by a short-term memory that stores the previous experiences for additional re-learning, to speed-up the learning process, mimicking what is done by people. In this work, we are proposing a novel computational approach able to selectively filter the information, or cognitive load, for the agent's short-term memory, thus emulating the attention mechanism characteristic of human perception. We devised an evolutionary model of agent's perception to adapt the attention filter present in the proposed architecture to the actual environment faced by the agent, by selecting the experiences that most likely influence in a positive way its learning characteristics. This approach can evolve a filter able to provide an optimal cognitive load of the experiences entering in the agent's short-term memory of a limited capacity. The evolved sampling dynamics can also lead to the emergence of intrinsically motivated curiosity.

show abstract

“…Deep learning is capable of capturing high‐level features from basic signals. Recently, Zhai et al () combined RL with deep learning to provide advantages of both, they called it deep RL. Deep Q‐Learning is one of the deep RL methods that combines Q‐Learning in RL with a deep neural network.…”

Section: Introductionmentioning

confidence: 99%

Trend following deep Q‐Learning strategy for stock trading

Chakole

Kurhekar

2019

Expert Systems

View full text Add to dashboard Cite

Computers and algorithms are widely used to help in stock market decision making. A few questions with regards to the profitability of algorithms for stock trading are can computers be trained to beat the markets? Can an algorithm take decisions for optimal profits? And so forth. In this research work, our objective is to answer some of these questions. We propose an algorithm using deep Q‐Reinforcement Learning techniques to make trading decisions. Trading in stock markets involves potential risk because the price is affected by various uncertain events ranging from political influences to economic constraints. Models that trade using predictions may not always be profitable mainly due to the influence of various unknown factors in predicting the future stock price. Trend Following is a trading idea in which, trading decisions, like buying and selling, are taken purely according to the observed market trend. A stock trend can be up, down, or sideways. Trend Following does not predict the stock price but follows the reversals in the trend direction. A trend reversal can be used to trigger a buy or a sell of a certain stock. In this research paper, we describe a deep Q‐Reinforcement Learning agent able to learn the Trend Following trading by getting rewarded for its trading decisions. Our results are based on experiments performed on the actual stock market data of the American and the Indian stock markets. The results indicate that the proposed model outperforms forecasting‐based methods in terms of profitability. We also limit risk by confirming trading actions with the trend before actual trading.

show abstract

Deep Q-Learning with Prioritized Sampling

Cited by 23 publications

References 3 publications

Attention-Based Experience Replay in Deep Q-Learning

Attention-Based Experience Replay in Deep Q-Learning

Selective Perception as a Mechanism to Adapt Agents to the Environment: An Evolutionary Approach

Trend following deep Q‐Learning strategy for stock trading

Contact Info

Product

Resources

About