“…In this paper, we use a policy-gradient method for learning efficient biped motion. The policy-gradient method is a kind of reinforcement learning method which maximizes the average reward with respect to parameters controlling action rules known as the policy (Shibata et al [2007], Tedrake et al [2004], Peters et al [2003]). In comparison with most standard value function-based reinforcement learning methods, this type of method has particular features suited to robotic applications.…”