2009
DOI: 10.1007/s10514-009-9132-0
|View full text |Cite
|
Sign up to set email alerts
|

Learning model-free robot control by a Monte Carlo EM algorithm

Abstract: We address the problem of learning robot control by model-free reinforcement learning (RL). We adopt the probabilistic model of Vlassis and Toussaint (2009) for model-free RL, and we propose a Monte Carlo EM algorithm (MCEM) for control learning that searches directly in the space of controller parameters using information obtained from randomly generated robot trajectories. MCEM is related to, and generalizes, the PoWER algorithm of Kober and Peters (2009). In the finite-horizon case MCEM reduces precisely to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 51 publications
(34 citation statements)
references
References 17 publications
0
34
0
Order By: Relevance
“…Our novel algorithm, PoWER, is based on an expectation-maximization inspired optimization and a structured, state-dependent exploration. Our approach has already given rise to follow-up work in other contexts, for example, [Vlassis et al, 2009, Kormushev et al, 2010. Theodorou et al [2010] have shown that an algorithm very similar to PoWER can also be derived from a completely different perspective, that is, the path integral approach.…”
Section: Policy Learning By Weighting Exploration With the Returns (Pmentioning
confidence: 96%
See 3 more Smart Citations
“…Our novel algorithm, PoWER, is based on an expectation-maximization inspired optimization and a structured, state-dependent exploration. Our approach has already given rise to follow-up work in other contexts, for example, [Vlassis et al, 2009, Kormushev et al, 2010. Theodorou et al [2010] have shown that an algorithm very similar to PoWER can also be derived from a completely different perspective, that is, the path integral approach.…”
Section: Policy Learning By Weighting Exploration With the Returns (Pmentioning
confidence: 96%
“…When the reward is treated as an improper probability distribution [Dayan and Hinton, 1997], safe and fast methods can be derived that are inspired by expectation-maximization. Some of these approaches have proven successful in robotics, e.g., reward-weighted regression [Peters and Schaal, 2008b], Policy Learning by Weighting Exploration with the Returns , Monte Carlo Expectation-Maximization [Vlassis et al, 2009], Cost-regularized Kernel Regression , and Policy Improvements with Path Integrals [Theodorou et al, 2010]. An overview of publications using policy search methods is presented in Table 2.2.…”
Section: Policy Searchmentioning
confidence: 99%
See 2 more Smart Citations
“…Interestingly, the EM algorithm family along with Monte Carlo sampling has also been used in [65] for model-free reinforcement learning, in order to obtain directly the policy without learning the model first. In [40], the authors use this kind of technique to optimize motion of a robot in order to minimize uncertainty of localization.…”
Section: Active Exploration While Learningmentioning
confidence: 99%