1999
DOI: 10.1613/jair.613
|View full text |Cite
|
Sign up to set email alerts
|

Evolutionary Algorithms for Reinforcement Learning

Abstract: There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
143
0
1

Year Published

2003
2003
2014
2014

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 252 publications
(144 citation statements)
references
References 43 publications
0
143
0
1
Order By: Relevance
“…The probability of successful strategies is increased. (see also (Moriarty and Schultz, 1999) • Genetic algorithm: Genetic algorithms are orientated on the concept of evolution. The available strategy space is broken down into strategies consisting of small segments (genes), e. g. the volume bid into the market.…”
Section: Models With Focus On Agent Decisions and Learningmentioning
confidence: 99%
“…The probability of successful strategies is increased. (see also (Moriarty and Schultz, 1999) • Genetic algorithm: Genetic algorithms are orientated on the concept of evolution. The available strategy space is broken down into strategies consisting of small segments (genes), e. g. the volume bid into the market.…”
Section: Models With Focus On Agent Decisions and Learningmentioning
confidence: 99%
“…There are many studies combined evolutionary algorithms and RL [9,10]. Although the approaches differ from our proposed technique, we see several studies in which GP and RL are combined [7,8].…”
Section: Related Workmentioning
confidence: 99%
“…Examples can be found in [37] and [18], where later work also considered optimization of the network structure [47], or using recurrent neural networks to better cope with hidden states [19]. A recent comparison of these methods can also be found in [51] and [26].…”
Section: Direct Policy Searchmentioning
confidence: 99%
“…However, a major issue in DPS is that the final performance strongly depends on the choice of an appropriate policy representation. Common policy representations include linear parametrizations [22], neural networks [37,47,19] or radial basis functions [6,14] and typically have hyper-parameters that require tuning (e.g. the number of hidden neurons).…”
Section: Introductionmentioning
confidence: 99%