Humanoid Robots, Human-Like Machines 2007
DOI: 10.5772/4804
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning of Stable Trajectory for Quasi-Passive Dynamic Walking of an Unstable Biped Robot

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2008
2008
2009
2009

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…The proposed Reinforcement learning structure is based on policy gradient Methods (Peters et al [2003], Shibata et al [2007], Tedrake et al [2004]). The policy-gradient method is a stochastic gradient-descent method.…”
Section: Compensator Of Dynamic Reactions Based On Rl Structurementioning
confidence: 99%
See 1 more Smart Citation
“…The proposed Reinforcement learning structure is based on policy gradient Methods (Peters et al [2003], Shibata et al [2007], Tedrake et al [2004]). The policy-gradient method is a stochastic gradient-descent method.…”
Section: Compensator Of Dynamic Reactions Based On Rl Structurementioning
confidence: 99%
“…In this paper, we use a policy-gradient method for learning efficient biped motion. The policy-gradient method is a kind of reinforcement learning method which maximizes the average reward with respect to parameters controlling action rules known as the policy (Shibata et al [2007], Tedrake et al [2004], Peters et al [2003]). In comparison with most standard value function-based reinforcement learning methods, this type of method has particular features suited to robotic applications.…”
Section: Introductionmentioning
confidence: 99%