1993
DOI: 10.1007/bf00993104
|View full text |Cite
|
Sign up to set email alerts
|

Prioritized sweeping: Reinforcement learning with less data and less time

Abstract: We present a new algorithm, prioritized sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as temporal differencing and Q-learning have real-time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
462
3
3

Year Published

1997
1997
2020
2020

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 444 publications
(472 citation statements)
references
References 13 publications
4
462
3
3
Order By: Relevance
“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”
Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning
confidence: 99%
“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”
Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning
confidence: 99%
“…All of the approaches we will describe will form explicit world models. Moore and Atkeson 1993 explore some of the advantages and disadvantages of approaches that form explicit models versus those that avoid forming models. Often the modeling process is equated with function approximation, in which a representational tool is used to t a training data set.…”
Section: Introductionmentioning
confidence: 99%
“…In other words, we cannot expect any 'almighty' method to have superior performance for all problems (Moore & Atkeson, 1993;Sutton & Barto, 1998). This, of course, is true of the proposed method, which has some crucial limitations.…”
Section: Limitationsmentioning
confidence: 93%
“…For example, 'exploration bonus' (Dayan & Sejnowski, 1996;Sutton, 1990) places additional weight on states that the agent has not visited recently. In 'prioritized sweeping' (Moore & Atkeson, 1993), the system puts the present state into the priority queue when the change in the state transition probability exceeds a given threshold. Including algorithms in the literature of artificial intelligence (Brafman & Tennenholtz, 2000;Kearn & Singh, 1998), most conventional studies have been based on model-based learning systems, that is, the systems included a state transition matrix and a reward matrix.…”
Section: Introductionmentioning
confidence: 99%