2021
DOI: 10.1109/lra.2021.3066833
|View full text |Cite
|
Sign up to set email alerts
|

Learning Spring Mass Locomotion: Guiding Policies With a Reduced-Order Model

Abstract: In this paper, we describe an approach to achieve dynamic legged locomotion on physical robots which combines existing methods for control with reinforcement learning. Specifically, our goal is a control hierarchy in which highestlevel behaviors are planned through reduced-order models, which describe the fundamental physics of legged locomotion, and lower level controllers utilize a learned policy that can bridge the gap between the idealized, simple model and the complex, full order robot. The high-level pla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(12 citation statements)
references
References 23 publications
0
12
0
Order By: Relevance
“…Various methods are proposed to solve trajectory optimization efficiently, including collocation method, e.g., [1], [2], [3], and shooting based method, e.g., [4], [5]. Simplified model such as the single rigid body dynamics model or inverted pendulum can also be used to get approximate solution [6], [7].…”
Section: A Trajectory Optimization For Legged Robotsmentioning
confidence: 99%
See 1 more Smart Citation
“…Various methods are proposed to solve trajectory optimization efficiently, including collocation method, e.g., [1], [2], [3], and shooting based method, e.g., [4], [5]. Simplified model such as the single rigid body dynamics model or inverted pendulum can also be used to get approximate solution [6], [7].…”
Section: A Trajectory Optimization For Legged Robotsmentioning
confidence: 99%
“…In (15), the position, orientation, and joint rewards are the basic motion imitation rewards with desired values x corresponding to that of the reference motion, whereas the action difference and maximum torque rewards are regularizers designed to mitigate specific sim-to-real issues that we observed. The action difference reward penalizes large differences in actions during consecutive RL environment steps to limit vibration and encourage smooth motions [7]. The maximum torque reward penalizes the maximum joint torque observed across all eight joints over the 20ms integration interval between each step, as current spikes were observed to cause faults in the motor controller hardware during experiments, since the power supply units that we used were not dynamic enough to maintain the required voltage during motions involving large impacts with the ground.…”
Section: Imitation-based Reinforcement Learningmentioning
confidence: 99%
“…Because of the complex dynamics of legged robots, several reduced-order dynamic models have been developed for legged robot control. For bipedal robots, the inverted pendulum model and its variants have been wildly used [19], [20], [21], [22]. For quadrupedal robots, simplifying the robot into a single rigid body that is driven by the sum of external forces from stance legs is a reliable approach to control [1], [23], [2].…”
Section: B Model-based Legged Locomotion Controlmentioning
confidence: 99%
“…On basis of end-to-end frameworks where the action is completely learned, the sum of model-based control signals and the action of learned policies provide external guidance and avoid pointless blind exploration during training. Low-cost model-based controllers such as central pattern generator [ 19 , 22 , 23 ], model-based gait library [ 24 , 25 , 26 ], and heuristic references [ 27 , 28 ] are often adopted in these approaches to achieve agile real-time controlled locomotion. In these frameworks, NN policies learn the residual between optimal decision and reference given by model-based modules.…”
Section: Introductionmentioning
confidence: 99%