2014 IEEE International Conference on Robotics and Automation (ICRA) 2014
DOI: 10.1109/icra.2014.6907631
|View full text |Cite
|
Sign up to set email alerts
|

Combining learned controllers to achieve new goals based on linearly solvable MDPs

Abstract: Learning complicated behaviors usually involves intensive manual tuning and expensive computational optimization because we have to solve a nonlinear HamiltonJacobi-Bellman (HJB) equation. Recently, Todorov proposed a class of the so-called Linearly solvable Markov Decision Process (LMDP) which converts a nonlinear HJB equation to a linear differential equation. Linearity of the simplified HJB equation allows us to apply superposition to derive a new composite controller from a set of learned primitive control… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Note that the REINFORCE algorithm does not need Q i . In addition, a deterministic stationary policy based on central pattern generators was prepared as prior knowledge, which was implemented by the modified Hopf oscillator (Uchibe and Doya, 2014 ). Since CRAIL uses multiple importance sampling, it is straightforward to use the deterministic policy as one of the sampling policies.…”
Section: Methodsmentioning
confidence: 99%
“…Note that the REINFORCE algorithm does not need Q i . In addition, a deterministic stationary policy based on central pattern generators was prepared as prior knowledge, which was implemented by the modified Hopf oscillator (Uchibe and Doya, 2014 ). Since CRAIL uses multiple importance sampling, it is straightforward to use the deterministic policy as one of the sampling policies.…”
Section: Methodsmentioning
confidence: 99%
“…RL problems requiring policies that solve several tasks at the same time are commonly stated as multiobjective or modular RL problems [12] [13]. The policies of all these subtasks may be combined using weights describing the predictability of the environmental dynamics [14], or the values obtained from the desirability function in a linearly-solvable control context [15]. Another alternative is to combine action-value functions of composable tasks, and then extract a policy from this combined function [8].…”
Section: Related Workmentioning
confidence: 99%
“…Recently, there have been various discussions about using autoencoders to control robots (Noda et al, 2014 ; Finn et al, 2016 ; van Hoof et al, 2016 ; Kondo and Takahashi, 2017 ). KullbackLeibler control (Todorov, 2009 ) is an interesting task-dependent approach to control robot (Uchibe and Doya, 2014 ; Matsubara et al, 2015 ) with combination of control policies.…”
Section: Introductionmentioning
confidence: 99%