2014
DOI: 10.1016/j.neunet.2014.06.006
|View full text |Cite
|
Sign up to set email alerts
|

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation

Abstract: The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the poli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…Both the model-based PGPE [30] and the PILCO [13] algorithm use gradient-based policy updates. Rather than using Monte Carlo sampling, as in model-based PGPE, PILCO performs deterministic approximate inference by explicitly incorporating the model uncertainty into long-term predictions.…”
Section: Related Workmentioning
confidence: 99%
“…Both the model-based PGPE [30] and the PILCO [13] algorithm use gradient-based policy updates. Rather than using Monte Carlo sampling, as in model-based PGPE, PILCO performs deterministic approximate inference by explicitly incorporating the model uncertainty into long-term predictions.…”
Section: Related Workmentioning
confidence: 99%
“…There exist many approaches to learn the models f and r (for model-based policy search) in the literature [47,113,179]. Most algorithms assume a known reward function; otherwise they usually use the same technique to learn both models.…”
Section: Model Learningmentioning
confidence: 99%
“…Policy Search with MBRL The standard approach consists in using a maximum likelihood estimation of the environment dynamics to perform simulations (or imaginary rollouts) through which a policy can be improved without further or with limited interactions with the environment (Deisenroth et al 2013). This approach has taken different forms, with the use of tabular models (Wang and Dietterich 2003), least-squares density estimation techniques (Tangkaratt et al 2014) or, more recently, combinations of variational generative models and recurrent neural networks employed in world models based on mixture density networks (Ha and Schmidhuber 2018). Several methods incorporate the model uncertainty into policy updates, by using Gaussian processes and moment matching approximations (Deisenroth and Rasmussen 2011), Bayesian neural networks (Gal, McAllister, and Rasmussen 2016) or ensembles of forward models (Chua et al 2018;Kurutach et al 2018;Janner et al 2019;Buckman et al 2018).…”
Section: Related Workmentioning
confidence: 99%