2014
DOI: 10.3182/20140824-6-za-1003.01987
|View full text |Cite
|
Sign up to set email alerts
|

A particle-based policy for the optimal control of Markov decision processes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2015
2015
2016
2016

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…• Introduction of a particle-based representation of the control policy for MDPs with finite control space; • Extension of the PGPE to this representation, which also includes a categorical distribution for the selection of the control action; • Design of an iterative procedure for the structure selection of the particle-based policy parametrization. A preliminary version of this work has been presented in [30], and is here extended and generalized, in particular by removing the limitation that the policy be characterized statically by a fixed number of particles with pre-assigned action.…”
Section: Introductionmentioning
confidence: 99%
“…• Introduction of a particle-based representation of the control policy for MDPs with finite control space; • Extension of the PGPE to this representation, which also includes a categorical distribution for the selection of the control action; • Design of an iterative procedure for the structure selection of the particle-based policy parametrization. A preliminary version of this work has been presented in [30], and is here extended and generalized, in particular by removing the limitation that the policy be characterized statically by a fixed number of particles with pre-assigned action.…”
Section: Introductionmentioning
confidence: 99%
“…The advantages are: I) less noisy estimate both from theoretical and empirical analysis [7]; II) possibility to exploit non differentiable policies. Furthermore, PGPE has been shown to outperform standard RL gradient approaches in many complex scenarios [8], [9].…”
Section: Introductionmentioning
confidence: 99%