Policy search for motor primitives in robotics

Kober, Jens; Peters, Jan

doi:10.1007/s10994-010-5223-6

Cited by 294 publications

(284 citation statements)

References 31 publications

Supporting

Mentioning

283

Contrasting

Unclassified

Order By: Relevance

“…There exist several algorithms which use probabilistic inference techniques for computing the policy update in reinforcement learning (Dayan and Hinton 1993;Theodorou et al 2010;Kober and Peters 2010;Peters et al 2010). More formally, they either re-weight state-action trajectories or state-action pairs according to the estimated quality of the state-action pair and, subsequently, use a weighted maximum likelihood estimate to obtain the parameters of a new policy π * .…”

Section: Probabilistic Reinforcement Learning Algorithmsmentioning

confidence: 99%

“…The parameter η is a temperature parameter that is either optimized by the algorithm Daniel et al 2012) or manually set (Theodorou et al 2010;Kober and Peters 2010). A new parametrized policy π * can then be obtained by minimizing the expected Kulback-Leibler divergence between the re-weighted policy update p(a|s) and the new parametric policy π * (van Hoof et al 2015), i.e.,…”

Section: Probabilistic Reinforcement Learning Algorithmsmentioning

confidence: 99%

“…In continuous state-action spaces, policy search (PS) methods which optimize parametrized policies have been shown to learn efficiently in simulated and real world tasks (Ng et al 1998;Kober and Peters 2010). Thus, the compatibility of the proposed option discovery framework with PS methods such as PoWER (Kober and Peters 2010) and REPS ) is an important goal of this paper.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, the compatibility of the proposed option discovery framework with PS methods such as PoWER (Kober and Peters 2010) and REPS ) is an important goal of this paper. In the discrete setting, the framework can equally be combined with a wide range of methods such as as Q-Learning (Christopher 1992) and LSPI (Lagoudakis and Parr 2003).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Probabilistic inference for determining options in reinforcement learning

et al. 2016

Self Cite

View full text Add to dashboard Cite

Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.

show abstract

Section: Probabilistic Reinforcement Learning Algorithmsmentioning

confidence: 99%

Section: Probabilistic Reinforcement Learning Algorithmsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Probabilistic inference for determining options in reinforcement learning

et al. 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…In most applications to date, only a single motion primitive is used for the whole movement. Parametrized policy search methods such as policy gradient descent and EM-like policy updates (Kober & Peters, 2009) have been used in order to improve single-stroke motor primitives.…”

Section: Introductionmentioning

confidence: 99%

Learning complex motions by sequencing simpler motion templates

Neumann

Maass

Peters

2009

Proceedings of the 26th Annual International Conference on Machine Learning

Self Cite

View full text Add to dashboard Cite

Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex -often beyond the reach of current machine learning methods.We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.

show abstract