Reinforcement Learning:  A Survey

Kaelbling, Leslie Pack; Littman, Michael L.; Moore, Andrew W.

doi:10.1613/jair.301

Cited by 6,374 publications

(3,463 citation statements)

References 93 publications

Supporting

Mentioning

3,347

Contrasting

Unclassified

107

Order By: Relevance

“…RL as a science is relatively young and has already made a considerable impact on operations research. The optimism expressed about RL in the early surveys (Keerthi and Ravindran, 1994;Kaelbling et al, 1996;Mahadevan, 1996) has been bolstered by several success stories.…”

Section: Resultsmentioning

confidence: 99%

Reinforcement Learning: A Tutorial Survey and Recent Advances

Gosavi

2009

INFORMS Journal on Computing

258

View full text Add to dashboard Cite

In the last few years, Reinforcement Learning (RL), also called adaptive (or approximate) dynamic programming (ADP), has emerged as a powerful tool for solving complex sequential decision-making problems in control theory. Although seminal research in this area was performed in the artificial intelligence (AI) community, more recently, it has attracted the attention of optimization theorists because of several noteworthy success stories from operations management. It is on large-scale and complex problems of dynamic optimization, in particular the Markov decision problem (MDP) and its variants, that the power of RL becomes more obvious. It has been known for many years that on large-scale MDPs, the curse of dimensionality and the curse of modeling render classical dynamic programming (DP) ineffective. The excitement in RL stems from its direct attack on these curses, allowing it to solve problems that were considered intractable, via classical DP, in the past. The success of RL is due to its strong mathematical roots in the principles of DP, Monte Carlo simulation, function approximation, and AI. Topics treated in some detail in this survey are: Temporal differences, Q-Learning, semi-MDPs and stochastic games. Several recent advances in RL, e.g., policy gradients and hierarchical RL, are covered along with references. Pointers to numerous examples of applications are provided. This overview is aimed at uncovering the mathematical roots of this science, so that readers gain a clear understanding of the core concepts and are able to use them in their own research. The survey points to more than 100 references from the literature.

show abstract

Section: Resultsmentioning

confidence: 99%

Reinforcement Learning: A Tutorial Survey and Recent Advances

Gosavi

2009

INFORMS Journal on Computing

258

View full text Add to dashboard Cite

show abstract

“…A value system can be used for regulating behavior and modulating learning (McFarland & Boesser, 1993Pfeifer & Scheier, 1998. Typically, in reinforcement learning approaches (e.g., Kaelbling, Littman, & Moore, 1996) and other adaptive models, such as map learning (e.g., Burgess, Recce, & O'Keefe, 1994), the value system is externally imposed by the experimenter. For example, certain sensory con gurations or locations of the environment are associated with positive r e w ards or can trigger synaptic changes.…”

Section: Discussionmentioning

confidence: 99%

Evolutionary neurocontrollers for autonomous mobile robots

1998

View full text Add to dashboard Cite

In this article we describe a methodology for evolving neurocontrollers of autonomous mobile robots without human intervention. The presentation, which spans from technological and methodological issues to several experimental results on evolution of physical mobile robots, covers both previous and recent w ork in the attempt to provide a uni ed picture within which the reader can compare the e ects of systematic variations on the experimental settings. After describing some key principles for building mobile robots and tools suitable for experiments in adaptive robotics, we g i v e a n o verview of di erent approaches to evolutionary robotics and present our methodology. We start reviewing two basic experiments showing that di erent e n vironments can shape very di erent behaviors and neural mechanisms under very similar selection criteria. We then address the issue of incremental evolution in two di erent experiments from the perspective of changing environments and robot morphologies. Finally, w e i n vestigate the possibility o f e v olving plastic neurocontrollers and analyze an evolved neurocontroller that relies on fast and continuously changes synapses characterized by dynamic stability. We conclude by reviewing the implications of this methodology for engineering, biology, cognitive science, and arti cial life, and point at future directions of research.

show abstract

“…In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are evolutionary algorithms [31].…”

Section: Taxonomy Of Supervised Learning Algorithmsmentioning

confidence: 99%

“…In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are evolutionary algorithms [31].Lazy learning: In artificial intelligence, lazy learning is a learning method in which generalization beyond the training data is delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries.The main advantage gained in employing a lazy learning method, such as Case based reasoning [19] , is that the target function will be approximated locally, such as in the k-nearest neighbor algorithm. Because the target function is approximated locally for each query to the system, lazy learning systems can simultaneously solve multiple problems and deal successfully with changes in the problem domain.…”

mentioning

confidence: 99%

The Classification of the Applicable Machine Learning Methods in Robot Manipulators

Hormozi¹,

Hormozi²,

Nohooji³

2012

IJMLC

View full text Add to dashboard Cite

Abstract-Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques used in robotic manipulators. Of course, a single article cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.Index Terms-Machine learning, adaptive control, repetitive control, robot manipulators. I. INTRODUCTIONMachine learning, a branch of artificial intelligence, is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data [1]. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases. Machine learning, like all subjects in artificial intelligence, require cross-disciplinary proficiency in several areas, such as probability theory, statistics, pattern recognition, cognitive science, data mining, adaptive control, computational neuroscience and theoretical computer science [2]. In this paper we are focused on learning algorithms for robot manipulators. II. MACHINE LEARNING ALGORITHMSMachine learning algorithms are organized into a taxonomy based on the desired outcome of the algorithm.Supervised learning generates a function that maps inputs to desired outputs. For example, in a classification problem, the learner approximates a function mapping a vector into classes by looking at input-output examples of the function for robot manipulators [1].Unsupervised learning models a set of inputs, like clustering [1].Semi-supervised learning combines both labeled and unlabeled examples to generate an appropriate function or classifier in manipulators [3].Reinforcement learning learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm [1] and [3].Transduction tries to predict new outputs based on training inputs, training outputs, an...

show abstract

Reinforcement Learning: A Survey

Cited by 6,374 publications

References 93 publications

Reinforcement Learning: A Tutorial Survey and Recent Advances

Reinforcement Learning: A Tutorial Survey and Recent Advances

Evolutionary neurocontrollers for autonomous mobile robots

The Classification of the Applicable Machine Learning Methods in Robot Manipulators

Contact Info

Product

Resources

About