2021
DOI: 10.48550/arxiv.2107.06629
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Model-free Reinforcement Learning for Robust Locomotion using Demonstrations from Trajectory Optimization

Abstract: In this work we present a general, two-stage reinforcement learning approach for going from a single demonstration trajectory to a robust policy that can be deployed on hardware without any additional training. The demonstration is used in the first stage as a starting point to facilitate initial exploration. In the second stage, the relevant task reward is optimized directly and a policy robust to environment uncertainties is computed. We demonstrate and examine in detail performance and robustness of our app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…Model-free RL for Robust Locomotion using Demonstrations from Trajectory Optimization [109] ArXiv 2022…”
Section: Isaac Pybullt Unitree A1mentioning
confidence: 99%
See 1 more Smart Citation
“…Model-free RL for Robust Locomotion using Demonstrations from Trajectory Optimization [109] ArXiv 2022…”
Section: Isaac Pybullt Unitree A1mentioning
confidence: 99%
“…Model-free Reinforcement Learning for Robust Locomotion using Demonstrations from Trajectory Optimization [109] Trajectory Optimization Algorithm [118] .…”
Section: Financial Support and Sponsorshipmentioning
confidence: 99%
“…To provide the user more control over the behaviors, reference trajectories can be provided to encourage desired motions. One can design a reward function to explicitly track the reference trajectories, e.g., [15], [16], [17]. Inverse reinforcement learning techniques such as adversarial motion priors can also be used to learn a reward function to encourage the policy to produce motions that look similar to a prescribed motion dataset, e.g., [18], [19].…”
Section: Imitation-based Reinforcement Learning For Legged Robotsmentioning
confidence: 99%
“…In very recent work [22] that is perhaps most similar to ours, a single trajectory was generated with a motion planner and RL was used to imitate it in a timing independent way and learn a bounding or hopping behavior which was finetuned and evaluated on a robot. Our work is different in that we generate an entire dataset of trajectories and use it for learning terrain adaptive policies that can generalize and be fine-tuned for a variety of challenging terrains.…”
Section: Planner Imitation For Legged Robotsmentioning
confidence: 99%