Reinforcement Learning in Continuous State and Action Spaces

Hasselt, Hado P. van

doi:10.1007/978-3-642-27645-3_7

Cited by 152 publications

(125 citation statements)

References 125 publications

Supporting

Mentioning

125

Contrasting

Order By: Relevance

“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”

Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…The DQN learning parameters in the Q(s, a) function are defined by aligning the maximum expectations of the utilities, which may be biased in some stochastic environments and hence result in overestimation. The double Q-network (van Hasselt et al, 2015) reduces overestimation by combining Q-learning and deep models, and thus can be used to approximate large-scale functions. The deep deterministic policy gradient (DDPG) optimization method for deep reinforcement models has improved robustness gradient (Lillicrap et al, 2015) estimation in dealing with deep continuous control models.…”

Section: Trends In the Development Of Ai Technology Applications For mentioning

confidence: 99%

Current trends in the development of intelligent unmanned autonomous systems

Zhang

et al. 2017

Frontiers Inf Technol Electronic Eng

135

View full text Add to dashboard Cite

Intelligent unmanned autonomous systems are some of the most important applications of artificial intelligence (AI). The development of such systems can significantly promote innovation in AI technologies. This paper introduces the trends in the development of intelligent unmanned autonomous systems by summarizing the main achievements in each technological platform. Furthermore, we classify the relevant technologies into seven areas, including AI technologies, unmanned vehicles, unmanned aerial vehicles, service robots, space robots, marine robots, and unmanned workshops/intelligent plants. Current trends and developments in each area are introduced.

show abstract

“…This increases the probability of overestimating the value of the state-action pairs (van Hasselt, 2010;van Hasselt et al, 2015). To see this more clearly, the target part of the loss in Equation 4 can be rewritten as follows:…”

Section: Double Dqn: Overcoming Overestimation and Instability Of Dqnmentioning

confidence: 99%

“…We analyse four deep RL models: Deep Q Networks (DQN) (Mnih et al, 2013), Double DQN (DDQN) (van Hasselt et al, 2015), Deep Advantage Actor-Critic (DA2C) (Sutton et al, 2000) and a version of DA2C initialized with supervised learning (TDA2C) 1 (similar idea to Silver et al (2016)). All models are trained on a restaurant-seeking domain.…”

Section: Introductionmentioning

confidence: 99%

Policy Networks with Two-Stage Training for Dialogue Systems

Fatemi¹,

Asri²,

Schulz³

et al. 2016

Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

View full text Add to dashboard Cite

In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods.Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actorcritic deep learner is considerably bootstrapped from a combination of supervised and batch RL. In addition, convergence to an optimal policy is significantly sped up compared to other deep RL methods initialized on the data with batch RL. All experiments are performed on a restaurant domain derived from the Dialogue State Tracking Challenge 2 (DSTC2) dataset.

show abstract

Reinforcement Learning in Continuous State and Action Spaces

Cited by 152 publications

References 125 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Current trends in the development of intelligent unmanned autonomous systems

Policy Networks with Two-Stage Training for Dialogue Systems

Contact Info

Product

Resources

About