1996
DOI: 10.1613/jair.301
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning: A Survey

Abstract: This paper surveys the eld of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the eld and a broad selection of current w ork are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but di ers considerably in the details and in th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
3,347
0
107

Year Published

1998
1998
2017
2017

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 6,374 publications
(3,463 citation statements)
references
References 93 publications
9
3,347
0
107
Order By: Relevance
“…RL as a science is relatively young and has already made a considerable impact on operations research. The optimism expressed about RL in the early surveys (Keerthi and Ravindran, 1994;Kaelbling et al, 1996;Mahadevan, 1996) has been bolstered by several success stories.…”
Section: Resultsmentioning
confidence: 99%
“…RL as a science is relatively young and has already made a considerable impact on operations research. The optimism expressed about RL in the early surveys (Keerthi and Ravindran, 1994;Kaelbling et al, 1996;Mahadevan, 1996) has been bolstered by several success stories.…”
Section: Resultsmentioning
confidence: 99%
“…A value system can be used for regulating behavior and modulating learning (McFarland & Boesser, 1993Pfeifer & Scheier, 1998. Typically, in reinforcement learning approaches (e.g., Kaelbling, Littman, & Moore, 1996) and other adaptive models, such as map learning (e.g., Burgess, Recce, & O'Keefe, 1994), the value system is externally imposed by the experimenter. For example, certain sensory con gurations or locations of the environment are associated with positive r e w ards or can trigger synaptic changes.…”
Section: Discussionmentioning
confidence: 99%
“…In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are evolutionary algorithms [31].…”
Section: Taxonomy Of Supervised Learning Algorithmsmentioning
confidence: 99%
“…In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are evolutionary algorithms [31].Lazy learning: In artificial intelligence, lazy learning is a learning method in which generalization beyond the training data is delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries.The main advantage gained in employing a lazy learning method, such as Case based reasoning [19] , is that the target function will be approximated locally, such as in the k-nearest neighbor algorithm. Because the target function is approximated locally for each query to the system, lazy learning systems can simultaneously solve multiple problems and deal successfully with changes in the problem domain.…”
mentioning
confidence: 99%