Incremental Multi-Step Q-Learning

Peng, Jing; Williams, Ronald J.

doi:10.1016/b978-1-55860-335-6.50035-0

Cited by 128 publications

(119 citation statements)

References 10 publications

(6 reference statements)

Supporting

Mentioning

111

Contrasting

Unclassified

Order By: Relevance

“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”

Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…First note that Q-learning's update at time t + 1 may change V (s t+1 ) in the definition of e t . Following Peng & Williams (1996) we define the TD(0)-error of V (s t+1 ) as…”

Section: Q(λ)-learningmentioning

confidence: 99%

“…Q(λ)-learning (Watkins, 1989;Peng & Williams, 1996) is an important reinforcement learning (RL) method. It combines Q-learning (Watkins, 1989;Watkins & Dayan, 1992) and TD(λ) (Sutton, 1988;Tesauro, 1992).…”

Section: Introductionmentioning

confidence: 99%

Untitled

Wiering

Schmidhuber

1998

Machine Learning

View full text Add to dashboard Cite

show abstract

“…One of the most widely known and promising EF-based approaches to reinforcement learning is TD-Q learning (Sutton, 1988;Watkins, 1989;Peng & Williams, 1996;Wiering & Schmidhuber, 1997). We use an offline TD(λ) Q-variant (Lin, 1993 …”

Section: Td-q Learningmentioning

confidence: 99%

“…In our case study we compare two learning algorithms, each representative of its class: TD-Q learning (Lin, 1993;Peng & Williams, 1996;Wiering & Schmidhuber, 1997) with linear neural networks (TD-Q) and Probabilistic Incremental Program Evolution (PIPE, Salustowicz & Schmidhuber, 1997). We also report results for a PIPE variant based on coevolution (CO-PIPE, Salustowicz et al, 1997).…”

Section: Introductionmentioning

confidence: 99%

Untitled

1998

View full text Add to dashboard Cite

Abstract. We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

show abstract

Incremental Multi-Step Q-Learning

Cited by 128 publications

References 10 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Untitled

Untitled

Contact Info

Product

Resources

About