2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids) 2017
DOI: 10.1109/humanoids.2017.8246914
|View full text |Cite
|
Sign up to set email alerts
|

Trial-and-error learning of repulsors for humanoid QP-based whole-body control

Abstract: Abstract-Whole body controllers based on quadratic programming allow humanoid robots to achieve complex motions. However, they rely on the assumption that the model perfectly captures the dynamics of the robot and its environment, whereas even the most accurate models are never perfect. In this paper, we introduce a trial-and-error learning algorithm that allows whole-body controllers to operate in spite of inaccurate models, without needing to update these models. The main idea is to encourage the controller … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 41 publications
0
13
0
Order By: Relevance
“…Trajectory-based policy types have been widely used in the robot learning literature [93,166,171,183,184], and especially within the policy search problem for robotics [75,76,171]. This type of policies are well-suited for several typical classes of tasks in robotics, such as point-to-point movements or repetitive movements.…”
Section: Trajectory-based Policiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Trajectory-based policy types have been widely used in the robot learning literature [93,166,171,183,184], and especially within the policy search problem for robotics [75,76,171]. This type of policies are well-suited for several typical classes of tasks in robotics, such as point-to-point movements or repetitive movements.…”
Section: Trajectory-based Policiesmentioning
confidence: 99%
“…Such a low-dimensional policy is an important, taskspecific prior that constrains what can be learnt. For example, central pattern generators can be used for rhythmic tasks such as locomotion [78], but they are unlikely to work well for a manipulation task; similarly, quadratic programming-based controllers (and in general model-based controllers) can facilitate learning whole body controllers for humanoid robots [104,166], but they impose the control strategy and the model. In summary, model-based policy search algorithms scale well with the dimensionality of the policy, but they do not scale with the dimensionality of the state space; and direct policy search algorithms scale well with the dimensionality of the state-space, but not with the dimensionality of the policy.…”
Section: Scalabilitymentioning
confidence: 99%
See 1 more Smart Citation
“…Solutions to this problem have recently been addressed by trial-and-error algorithms [10], [11]. In [10], prior knowledge from simulations was exploited to find acceptable behaviors on the real robot, in few trials.…”
Section: Introductionmentioning
confidence: 99%
“…In [10], prior knowledge from simulations was exploited to find acceptable behaviors on the real robot, in few trials. In [11], a trial-and-error learning algorithm encouraged exploration of the task space, allowing adaptation to inaccurate models, also in few trials.…”
Section: Introductionmentioning
confidence: 99%