Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2017
DOI: 10.1145/3099564.3099567
|View full text |Cite
|
Sign up to set email alerts
|

Learning locomotion skills using DeepRL

Abstract: The use of deep reinforcement learning allows for high-dimensional state descriptors, but little is known about how the choice of action representation impacts the learning difficulty and the resulting performance. We compare the impact of four different action parameterizations (torques, muscle-activations, target joint angles, and target joint-angle velocities) in terms of learning time, policy robustness, motion quality, and policy query rates. Our results are evaluated on a gaitcycle imitation task for mul… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 110 publications
(22 citation statements)
references
References 22 publications
0
22
0
Order By: Relevance
“…Some of these examples in Imitation Learning are shown in Figure 3. The choice of action space is also shown to have an impact the speed and quality of imitation‐based learning [PvdP17].…”
Section: Skeletal Animationmentioning
confidence: 99%
“…Some of these examples in Imitation Learning are shown in Figure 3. The choice of action space is also shown to have an impact the speed and quality of imitation‐based learning [PvdP17].…”
Section: Skeletal Animationmentioning
confidence: 99%
“…Early works in human animation already considered kinematic constraints such as foot contacts [BB98, LCR * XLKvdP20,YTL18] where ground reaction forces are explicitly modelled in the physics engine, and also relates to footplacement strategies that are a real challenge for locomotion policies [PvdP17]. In the following, we overview existing approaches for foot contacts labels detection (2.1) and ground reaction forces estimation (2.2), as well as existing databases of motion data labelled with information on foot contacts (2.3).…”
Section: Related Workmentioning
confidence: 99%
“…The output action a t of the control policy at each time step (running at 40Hz) is an 11 dimensional vector with the first 10 entries corresponding to PD targets for the joints, each of which are fed into a PD controller for each joint operating at 2KHz. Prior work has found it advantageous to learn actions in the PD target space rather than directly learning the higherrate actuation commands [15].…”
Section: B Action Spacementioning
confidence: 99%