2014
DOI: 10.1007/978-3-319-12643-2_37
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic Decision Making in Learning Classifier Systems through a Natural Policy Gradient Method

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
8
0

Year Published

2015
2015
2016
2016

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 8 publications
1
8
0
Order By: Relevance
“…Estimate θ t log π, µ t log π, and σ t log π (13) and (14), δ t can be computed straightforwardly from (24). Once θt logà [M ] s t is approximated by using (26) and (27), we can proceed to approximate µt logà [M ] s t and σt logà [M ] s t as well.…”
Section: A Policy Parameter Learning Componentmentioning
confidence: 99%
See 2 more Smart Citations
“…Estimate θ t log π, µ t log π, and σ t log π (13) and (14), δ t can be computed straightforwardly from (24). Once θt logà [M ] s t is approximated by using (26) and (27), we can proceed to approximate µt logà [M ] s t and σt logà [M ] s t as well.…”
Section: A Policy Parameter Learning Componentmentioning
confidence: 99%
“…We have shown previously that it is often beneficial for an agent to learn stochastic policies or stochastic action selection strategies [21], [24]. For reinforcement learning in continuous spaces, a stochastic policy π( s t , a t ) at any time t describes a continuous probability distribution, from which the actual action a t to be performed by the agent will be sampled.…”
Section: Reinforcement Learning In Continuous Spacesmentioning
confidence: 99%
See 1 more Smart Citation
“…In general, however, G is a function of ω. In the Euclidean space, learning of ω is carried out through the updating rule in (19). In the Riemannian space, on the other hand, according to [1], learning should be performed based on the natural gradient of J, i.e.˜ ωt J.…”
Section: Learnpolicyparameters() {mentioning
confidence: 99%
“…Instead of learning deterministic policies, an agent that learns stochastic policies may easily escape from a local loop and hence potentially present a good solution to the above problem [19]. This idea can be realized by using, for example, a probabilistic action selection strategy where the chance of performing any action in a state is made proportional to its expected payoff (i.e.…”
Section: Introductionmentioning
confidence: 99%