Prioritized sweeping: Reinforcement learning with less data and less time

Moore, Andrew W.; Atkeson, Christopher G.

doi:10.1007/bf00993104

Cited by 444 publications

(472 citation statements)

References 13 publications

Supporting

Mentioning

462

Contrasting

Unclassified

Order By: Relevance

“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”

Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…All of the approaches we will describe will form explicit world models. Moore and Atkeson 1993 explore some of the advantages and disadvantages of approaches that form explicit models versus those that avoid forming models. Often the modeling process is equated with function approximation, in which a representational tool is used to t a training data set.…”

Section: Introductionmentioning

confidence: 99%

Locally Weighted Learning for Control

1997

Self Cite

View full text Add to dashboard Cite

Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by u s t o c o n trol tasks. We explain various forms that control tasks can take, and how this a ects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.

show abstract

“…In other words, we cannot expect any 'almighty' method to have superior performance for all problems (Moore & Atkeson, 1993;Sutton & Barto, 1998). This, of course, is true of the proposed method, which has some crucial limitations.…”

Section: Limitationsmentioning

confidence: 93%

“…For example, 'exploration bonus' (Dayan & Sejnowski, 1996;Sutton, 1990) places additional weight on states that the agent has not visited recently. In 'prioritized sweeping' (Moore & Atkeson, 1993), the system puts the present state into the priority queue when the change in the state transition probability exceeds a given threshold. Including algorithms in the literature of artificial intelligence (Brafman & Tennenholtz, 2000;Kearn & Singh, 1998), most conventional studies have been based on model-based learning systems, that is, the systems included a state transition matrix and a reward matrix.…”

Section: Introductionmentioning

confidence: 99%

Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function

Sugita¹,

Takano²

2004

Neural Networks

View full text Add to dashboard Cite

This article proposes an adaptive action-selection method for a model-free reinforcement learning system, based on the concept of the 'reliability of internal prediction/estimation'. This concept is realized using an internal variable, called the Reliability Index (RI), which estimates the accuracy of the internal estimator. We define this index for a value function of a temporal difference learning system and substitute it for the temperature parameter of the Boltzmann action-selection rule. Accordingly, the weight of exploratory actions adaptively changes depending on the uncertainty of the prediction. We use this idea for tabular and weighted-sum type value functions. Moreover, we use the RI to adjust the learning coefficient in addition to the temperature parameter, meaning that the reliability becomes a general basis for meta-learning. Numerical experiments were performed to examine the behavior of the proposed method. The RI-based Q-learning system demonstrated its features when the adaptive learning coefficient and large RI-discount rate (which indicate how the RI values of future states are reflected in the RI value of the current state) were introduced. Statistical tests confirmed that the algorithm spent more time exploring in the initial phase of learning, but accelerated learning from the midpoint of learning. It is also shown that the proposed method does not work well with the actor-critic models. The limitations of the proposed method and its relationship to relevant research are discussed. q

show abstract

Prioritized sweeping: Reinforcement learning with less data and less time

Cited by 444 publications

References 13 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Locally Weighted Learning for Control

Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function

Contact Info

Product

Resources

About