Machine Learning Proceedings 1994 1994
DOI: 10.1016/b978-1-55860-335-6.50035-0
|View full text |Cite
|
Sign up to set email alerts
|

Incremental Multi-Step Q-Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
111
0
8

Year Published

1998
1998
2015
2015

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 128 publications
(119 citation statements)
references
References 10 publications
(6 reference statements)
0
111
0
8
Order By: Relevance
“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”
Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning
confidence: 99%
“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”
Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning
confidence: 99%
“…First note that Q-learning's update at time t + 1 may change V (s t+1 ) in the definition of e t . Following Peng & Williams (1996) we define the TD(0)-error of V (s t+1 ) as…”
Section: Q(λ)-learningmentioning
confidence: 99%
“…Q(λ)-learning (Watkins, 1989;Peng & Williams, 1996) is an important reinforcement learning (RL) method. It combines Q-learning (Watkins, 1989;Watkins & Dayan, 1992) and TD(λ) (Sutton, 1988;Tesauro, 1992).…”
Section: Introductionmentioning
confidence: 99%
“…One of the most widely known and promising EF-based approaches to reinforcement learning is TD-Q learning (Sutton, 1988;Watkins, 1989;Peng & Williams, 1996;Wiering & Schmidhuber, 1997). We use an offline TD(λ) Q-variant (Lin, 1993 …”
Section: Td-q Learningmentioning
confidence: 99%
“…In our case study we compare two learning algorithms, each representative of its class: TD-Q learning (Lin, 1993;Peng & Williams, 1996;Wiering & Schmidhuber, 1997) with linear neural networks (TD-Q) and Probabilistic Incremental Program Evolution (PIPE, Salustowicz & Schmidhuber, 1997). We also report results for a PIPE variant based on coevolution (CO-PIPE, Salustowicz et al, 1997).…”
Section: Introductionmentioning
confidence: 99%