Learning Spring Mass Locomotion: Guiding Policies With a Reduced-Order Model

Green, Kevin; Godse, Yesh; Dao, Jeremy; Hatton, Ross L.; Fern, Alan; Hurst, Jonathan

doi:10.1109/lra.2021.3066833

Cited by 38 publications

(12 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various methods are proposed to solve trajectory optimization efficiently, including collocation method, e.g., [1], [2], [3], and shooting based method, e.g., [4], [5]. Simplified model such as the single rigid body dynamics model or inverted pendulum can also be used to get approximate solution [6], [7].…”

Section: A Trajectory Optimization For Legged Robotsmentioning

confidence: 99%

“…In (15), the position, orientation, and joint rewards are the basic motion imitation rewards with desired values x corresponding to that of the reference motion, whereas the action difference and maximum torque rewards are regularizers designed to mitigate specific sim-to-real issues that we observed. The action difference reward penalizes large differences in actions during consecutive RL environment steps to limit vibration and encourage smooth motions [7]. The maximum torque reward penalizes the maximum joint torque observed across all eight joints over the 20ms integration interval between each step, as current spikes were observed to cause faults in the motor controller hardware during experiments, since the power supply units that we used were not dynamic enough to maintain the required voltage during motions involving large impacts with the ground.…”

Section: Imitation-based Reinforcement Learningmentioning

confidence: 99%

See 1 more Smart Citation

OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

Fuchioka¹,

Xie²,

Panne³

2022

Preprint

View full text Add to dashboard Cite

Reinforcement Learning (RL) has seen many recent successes for quadruped robot control. The imitation of reference motions provides a simple and powerful prior for guiding solutions towards desired solutions without the need for meticulous reward design. While much work uses motion capture data or hand-crafted trajectories as the reference motion, relatively little work has explored the use of reference motions coming from model-based trajectory optimization. In this work, we investigate several design considerations that arise with such a framework, as demonstrated through four dynamic behaviours: trot, front hop, 180 backflip, and biped stepping. These are trained in simulation and transferred to a physical Solo 8 quadruped robot without further adaptation. In particular, we explore the space of feed-forward designs afforded by the trajectory optimizer to understand its impact on RL learning efficiency and sim-to-real transfer. These findings contribute to the long standing goal of producing robot controllers that combine the interpretability and precision of model-based optimization with the robustness that model-free RL-based controllers offer.

show abstract

Section: A Trajectory Optimization For Legged Robotsmentioning

confidence: 99%

Section: Imitation-based Reinforcement Learningmentioning

confidence: 99%

OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

Fuchioka¹,

Xie²,

Panne³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Because of the complex dynamics of legged robots, several reduced-order dynamic models have been developed for legged robot control. For bipedal robots, the inverted pendulum model and its variants have been wildly used [19], [20], [21], [22]. For quadrupedal robots, simplifying the robot into a single rigid body that is driven by the sum of external forces from stance legs is a reliable approach to control [1], [23], [2].…”

Section: B Model-based Legged Locomotion Controlmentioning

confidence: 99%

FastMimic: Model-based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion

Li¹,

Won²,

Ha³

et al. 2021

Preprint

View full text Add to dashboard Cite

Robots operating in human environments need a variety of skills, like slow and fast walking, turning, and sidestepping. However, building robot controllers that can exhibit such a large range of behaviors is challenging, and unsolved. We present an approach that uses a model-based controller for imitating different animal gaits without requiring any realworld fine-tuning. Unlike previous works that learn one policy per motion, we present a unified controller which is capable of generating four different animal gaits on the A1 robot. Our framework includes a trajectory optimization procedure that improves the quality of real-world imitation. We demonstrate our results in simulation and on a real 12-DoF A1 quadruped robot. Our result shows that our approach can mimic four animal motions, and outperform baselines learned per motion.

show abstract

“…On basis of end-to-end frameworks where the action is completely learned, the sum of model-based control signals and the action of learned policies provide external guidance and avoid pointless blind exploration during training. Low-cost model-based controllers such as central pattern generator [ 19 , 22 , 23 ], model-based gait library [ 24 , 25 , 26 ], and heuristic references [ 27 , 28 ] are often adopted in these approaches to achieve agile real-time controlled locomotion. In these frameworks, NN policies learn the residual between optimal decision and reference given by model-based modules.…”

Section: Introductionmentioning

confidence: 99%

Hybrid Bipedal Locomotion Based on Reinforcement Learning and Heuristics

et al. 2022

View full text Add to dashboard Cite

Locomotion control has long been vital to legged robots. Agile locomotion can be implemented through either model-based controller or reinforcement learning. It is proven that robust controllers can be obtained through model-based methods and learning-based policies have advantages in generalization. This paper proposed a hybrid framework of locomotion controller that combines deep reinforcement learning and simple heuristic policy and assigns them to different activation phases, which provides guidance for adaptive training without producing conflicts between heuristic knowledge and learned policies. The training in simulation follows a step-by-step stochastic curriculum to guarantee success. Domain randomization during training and assistive extra feedback loops on real robot are also adopted to smooth the transition to the real world. Comparison experiments are carried out on both simulated and real Wukong-IV humanoid robots, and the proposed hybrid approach matches the canonical end-to-end approaches with higher rate of success, faster converging speed, and 60% less tracking error in velocity tracking tasks.

show abstract

Learning Spring Mass Locomotion: Guiding Policies With a Reduced-Order Model

Cited by 38 publications

References 23 publications

OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

FastMimic: Model-based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion

Hybrid Bipedal Locomotion Based on Reinforcement Learning and Heuristics

Contact Info

Product

Resources

About