Trial-and-error learning of repulsors for humanoid QP-based whole-body control

Spitz, Jonathan; Bouyarmane, Karim; Ivaldi, Serena; Mouret, Jean-Baptiste

doi:10.1109/humanoids.2017.8246914

Cited by 8 publications

(13 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Trajectory-based policy types have been widely used in the robot learning literature [93,166,171,183,184], and especially within the policy search problem for robotics [75,76,171]. This type of policies are well-suited for several typical classes of tasks in robotics, such as point-to-point movements or repetitive movements.…”

Section: Trajectory-based Policiesmentioning

confidence: 99%

“…Such a low-dimensional policy is an important, taskspecific prior that constrains what can be learnt. For example, central pattern generators can be used for rhythmic tasks such as locomotion [78], but they are unlikely to work well for a manipulation task; similarly, quadratic programming-based controllers (and in general model-based controllers) can facilitate learning whole body controllers for humanoid robots [104,166], but they impose the control strategy and the model. In summary, model-based policy search algorithms scale well with the dimensionality of the policy, but they do not scale with the dimensionality of the state space; and direct policy search algorithms scale well with the dimensionality of the state-space, but not with the dimensionality of the policy.…”

Section: Scalabilitymentioning

confidence: 99%

“…This is one of the ideas behind dynamic movement primitives (see Section 3), which act like "attractors" towards a trajectory of a fixed point. Similarly, it is possible to learn waypoints [119] or "repulsors" [166] to mix learning with advanced, closed-loop "whole-body" controllers. It is, also, possible to incorporate optimization layers (e.g., a QP program [144]) in a neural network in order to take advantage of the structure they provide.…”

Section: Generalization and Robustnessmentioning

confidence: 99%

See 2 more Smart Citations

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

et al. 2020

Self Cite

View full text Add to dashboard Cite

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.1 In some rare cases, a process can be "optimally efficient". 2 It is challenging to put a precise limit for "micro-data learning" as each domain has different experimental constraints, this is why we will refer in this article to "a few minutes" or a "a few trials". The commonly used word "big-data" has a similar "fuzzy" limit that depends on the exact domain. 3 Planning-based and model-predictive control [59] methods do not search for policy parameters, this is why they do not fit into the scope of this paper. Although trajectory-based policies and planning-based methods share the same goal, they usually search in a different space: planning algorithms search in the state-action space (e.g., joint positions/velocities), whereas policy methods will search for the optimal parameters of the policy, which can encode a Chatzilygeroudis, Vassiliades, Stulp, Calinon and Mouret arXiv | 1 arXiv:1807.02303v4 [cs.RO] 31 May 2019 Chatzilygeroudis, Vassiliades, Stulp, Calinon and Mouret arXiv | 2This is basically sampling the distribution over trajectories, P (τ |θ), which is feasible since the sampling is performed with the models. When applying the same policy (i.e., a policy with the same parameters θ), the trajectories τ (and consequently r)Chatzilygeroudis, Vassiliades, Stulp, Calinon and Mouret arXiv | 9

show abstract

Section: Trajectory-based Policiesmentioning

confidence: 99%

Section: Scalabilitymentioning

confidence: 99%

Section: Generalization and Robustnessmentioning

confidence: 99%

See 1 more Smart Citation

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Solutions to this problem have recently been addressed by trial-and-error algorithms [10], [11]. In [10], prior knowledge from simulations was exploited to find acceptable behaviors on the real robot, in few trials.…”

Section: Introductionmentioning

confidence: 99%

“…In [10], prior knowledge from simulations was exploited to find acceptable behaviors on the real robot, in few trials. In [11], a trial-and-error learning algorithm encouraged exploration of the task space, allowing adaptation to inaccurate models, also in few trials.…”

Section: Introductionmentioning

confidence: 99%

Learning Robust Task Priorities of QP-Based Whole-Body Torque-Controllers

Charbonneau¹,

Modugno²,

Nori³

et al. 2018

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)

Self Cite

View full text Add to dashboard Cite

Generating complex whole-body movements for humanoid robots is now most often achieved with multi-task whole-body controllers based on quadratic programming. To perform on the real robot, such controllers often require a human expert to tune or optimize the many parameters of the controller related to the tasks and to the specific robot, which is generally reported as a tedious and time consuming procedure. This problem can be tackled by automatically optimizing some parameters such as task priorities or task trajectories, while ensuring constraints satisfaction, through simulation. However, this does not guarantee that parameters optimized in simulation will also be optimal for the real robot. As a solution, the present paper focuses on optimizing task priorities in a robust way, by looking for solutions which achieve desired tasks under a variety of conditions and perturbations. This approach, which can be referred to as domain randomization, can greatly facilitate the transfer of optimized solutions from simulation to a real robot. The proposed method is demonstrated using a simulation of the humanoid robot iCub for a whole-body stepping task. This work was supported by the EU H2020 program under the Marie Sklodowska-Curie SECURE grant (n.642667), as well as the european projects An.Dy (n.731540) and Comanoid (n.645097).

show abstract

Simple Method for Humanoid Robot Posture Design

Zielinska,

Kahraman

2022

ROMANSY 24 - Robot Design, Dynamics and Control

View full text Add to dashboard Cite

Trial-and-error learning of repulsors for humanoid QP-based whole-body control

Cited by 8 publications

References 41 publications

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Learning Robust Task Priorities of QP-Based Whole-Body Torque-Controllers

Simple Method for Humanoid Robot Posture Design

Contact Info

Product

Resources

About