Rapid Locomotion via Reinforcement Learning

Margolis, Gabriel B.; Yang, Ge; Paigwar, Kartik; Chen, Tao; Agrawal, Pulkit

doi:10.15607/rss.2022.xviii.022

Cited by 75 publications

(32 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2) Model-free RL for legged locomotion control: Recent years have seen exciting progress on using deep RL to learn locomotion controllers for quadrupedal robots [38]- [41] and bipedal robots [42]- [47] in the real world. Since it is challenging in general to learn a single policy with RL to perform various tasks [48], many prior works focus on learning a single-task policy [16], [17], [49], [50] for legged robots, such as just forward walking [39], [51], [52]. There have been efforts to obtain a multi-task policy, such as walking at different velocities using different gaits, conditioned only on variable commands [46], [53]- [55], which requires more extensive tuning due to the lack of a gait prior.…”

Section: Related Workmentioning

confidence: 99%

“…Since performing rollout on the hardware of human-scale bipedal robots is expensive, we use the zero-shot transfer method. In order to realize this, there are two widely-adopted techniques: (i) end-to-end training a policy by providing the robot with a proprioceptive short-term history [39], [45], [57] or longterm history [44], [62], [68], (ii) teacher-student training that first obtains a teacher policy with privileged information of the environment by RL, then uses this policy to supervise the training of a student policy that only has access of onboardavailable observations [18], [38], [40], [42], [52], [55], which shows advantages over the end-to-end training method [38], [52], [70]. However, here we show that, for the dynamic control of bipedal robots, by training the robot in an endto-end way with a newly-proposed policy structure, we can realize a better learning performance over the teacher-student method which separates the training process and requires more data.…”

Section: Related Workmentioning

confidence: 99%

“…This expert policy is then used to supervise the training of an RMA (student) policy, which uses the base MLP copied from the expert policy, while using a long I/O history encoder to predict the teacher's extrinsic vector. This two-stage training scheme is used in [38], [52] and also adopted in other work such as [18], [40], [55]. • A-RMA (Fig.…”

Section: B Details Of Dynamics Randomizationmentioning

confidence: 99%

See 2 more Smart Citations

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Li¹,

Peng²,

Abbeel³

et al. 2023

Preprint

View full text Add to dashboard Cite

0°-55°( a) (b) (c) Flight Phase Fig. 1: Representative dynamic jumping maneuvers performed by a bipedal robot Cassie using the proposed multi-task control policies. From left to right: (a) the robot jumps over 1.4 m and lands at the given target; (b) the robot jumps to a target that is 0.88 m in front of the robot and 0.44 m above the ground, and (c) the robot jumps in place while turning 55 • with a command to turn 60 • in place. The policies are trained in simulation and deployed on the hardware without further tuning. Video is at: https://youtu.be/aAPSZ2QFB-E.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: B Details Of Dynamics Randomizationmentioning

confidence: 99%

See 1 more Smart Citation

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Li¹,

Peng²,

Abbeel³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Empirical results show that as discrepancies between the training and deployment environments become more intense, invariance through latent alignmenthas a large competitive edge over alternatives such as data augmentation techniques. The problem of test-time adaptation in visual reinforcement learning using unsupervised test-time trajectories is relatively new, but has thus far shown great relevance and promise in robotics [11,13], where a sim-to-real pipeline has been at the fore-front of recent progress [20,28,41].…”

Section: Closing Remarksmentioning

confidence: 99%

“…Reinforcement learning for control has achieved great success in a wide variety of challenging sensory-motor control tasks, including agile drone flight [20,21,26], deformable object manipulation [41], and quadruped locomotion [19,24,28,30]. In comparison to their classical model-predictive control counterparts, reinforcement learning-based approaches enables the use of more realistic forward dynamics model in the form of a physics simulator.…”

Section: Introductionmentioning

confidence: 99%

Invariance Through Latent Alignment

Yoneda¹,

Yang²,

Walter³

et al. 2022

Robotics: Science and Systems XVIII

Self Cite

View full text Add to dashboard Cite

A robot's deployment environment often involves perceptual changes that differ from what it has experienced during training. Standard practices such as data augmentation attempt to bridge this gap by augmenting source images in an effort to extend the support of the training distribution to better cover what the agent might experience at test time.In many cases, however, it is impossible to know test-time distribution-shift a priori, making these schemes infeasible. In this paper, we introduce a general approach, called Invariance through Latent Alignment (ILA), that improves the test-time performance of a visuomotor control policy in deployment environments with unknown perceptual variations. ILA performs unsupervised adaptation at deploymenttime by matching the distribution of latent features on the target domain to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of challenging adaptation scenarios, including changes in lighting conditions, the content in the scene, and camera poses. We present results on calibrated control benchmarks in simulation-the distractor control suite-and a physical robot under a sim-to-real setup.

show abstract

Learning Various Locomotion Skills from Scratch with Deep Reinforcement Learning

Sorokin¹,

Babaev²

2022

Studies in Computational Intelligence

View full text Add to dashboard Cite

Rapid Locomotion via Reinforcement Learning

Cited by 75 publications

References 1 publication

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Invariance Through Latent Alignment

Learning Various Locomotion Skills from Scratch with Deep Reinforcement Learning

Contact Info

Product

Resources

About