2018
DOI: 10.48550/arxiv.1804.10332
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
179
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 122 publications
(179 citation statements)
references
References 25 publications
0
179
0
Order By: Relevance
“…The idea of MEP shares a common intuition with domain randomization, where some features of the environment are changed randomly during training to make the policy robust to that feature [45,50,34,42,1,43]. MEP can be seen as a domain randomization technique, where the randomization is conducted over a set of partners' policies.…”
Section: Related Workmentioning
confidence: 99%
“…The idea of MEP shares a common intuition with domain randomization, where some features of the environment are changed randomly during training to make the policy robust to that feature [45,50,34,42,1,43]. MEP can be seen as a domain randomization technique, where the randomization is conducted over a set of partners' policies.…”
Section: Related Workmentioning
confidence: 99%
“…Reinforcement learning has been demonstrated to be a promising tool to build controllers for legged robotics [7]- [10]. With well-designed reward functions, controller for locomotion can be trained from simulation and deployed to hardware [7].…”
Section: B Reference-based Rl Controllers For Legged Roboticsmentioning
confidence: 99%
“…To reproduce the agile locomotion, one key step is to reproduce each gait and all the transitions with a robust controller for quadruped robots. The state-of-the-art controllers, including model-based controllers [3]- [6] and reinforcement learning (RL) controllers [7]- [10], have achieved excellent performance in outdoor locomotion tests with specific gaits. Both types of controllers, however, can hardly make free transitions according to speed or terrains.…”
Section: Introductionmentioning
confidence: 99%
“…2) Student: The student policy is distilled from the teacher policy by minimizing the loss function in (9), where L action penalizes the difference between the student and teacher action, and L embedding regresses the student wrench and exteroceptive embeddings towards those from the teacher. L total = L action + L embedding + L decoder (9) L decoder = L privileged + L scan + L w1 + L w2 (10) In (10), L w1 is the scaled mean-squared error between the decoded wrench and the current external wrench applied to the robot base.…”
Section: Base Policy Trainingmentioning
confidence: 99%