Learning Memory-Based Control for Human-Scale Bipedal Locomotion

Siekmann, Jonah; Valluri, Srikar; Dao, Jeremy; Bermillo, Francis; Duan, Helei; Fern, Alan; Hurst, Jonathan

doi:10.15607/rss.2020.xvi.031

Cited by 62 publications

(35 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We represent the control policy as an LSTM recurrent neural network [17], with two recurrent hidden layers of dimension 128 each. We opt to use a memory-enabled network because of previous work demonstrating a higher degree of proficiency in handling partially observable environments [18] [16] [19]. For ablation experiments, involving non-memory-based control policies, we use a standard feedforward neural network with two layers of dimension 300, with tanh activation functions, such that the number of parameters is approximately equal to that of the LSTM network.…”

Section: E Policy Representation and Learningmentioning

confidence: 99%

“…During training, we make use of a mirror loss term [21] in order to ensure that the control policy does not learn asymmetric gaits. For recurrent policies, we sample batches of episodes from a replay buffer as in [19], while for feedforward policies we sample batches of timesteps. Each episode is limited to be 300 timesteps, which corresponds to about 7.5 seconds of simulation time.…”

Section: E Policy Representation and Learningmentioning

confidence: 99%

“…This indicator function is likely to be 1 during the interval in which it is active, and likely to be 0 during intervals in which it is not active. The distribution of this binary-valued random function is defined via the Von Mises distribution; for a more comprehensive description, see [19]. In addition, rather than use the actual random variable in the reward we instead opt to use its expectation for more stable learning; see Fig.…”

Section: Reward Functionmentioning

confidence: 99%

See 2 more Smart Citations

Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning

Siekmann

Green

Warila

et al. 2021

Robotics: Science and Systems XVII

Self Cite

122

View full text Add to dashboard Cite

Accurate and precise terrain estimation is a difficult problem for robot locomotion in real-world environments. Thus, it is useful to have systems that do not depend on accurate estimation to the point of fragility. In this paper, we explore the limits of such an approach by investigating the problem of traversing stair-like terrain without any external perception or terrain models on a bipedal robot. For such blind bipedal platforms, the problem appears difficult (even for humans) due to the surprise elevation changes. Our main contribution is to show that sim-to-real reinforcement learning (RL) can achieve robust locomotion over stair-like terrain on the bipedal robot Cassie using only proprioceptive feedback. Importantly, this only requires modifying an existing flat-terrain training RL framework to include stair-like terrain randomization, without any changes in reward function. To our knowledge, this is the first controller for a bipedal, human-scale robot capable of reliably traversing a variety of real-world stairs and other stair-like disturbances using only proprioception.

show abstract

Section: E Policy Representation and Learningmentioning

confidence: 99%

Section: E Policy Representation and Learningmentioning

confidence: 99%

Section: Reward Functionmentioning

confidence: 99%

See 1 more Smart Citation

Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning

Siekmann

Green

Warila

et al. 2021

Robotics: Science and Systems XVII

Self Cite

122

View full text Add to dashboard Cite

show abstract

“…To quantify the matching of the robot's natural dynamics and the control task dynamics we use the neuroelastic activity as a proxy. If the dynamics do not To optimize and tune the control tasks, different methods 111 have been used such as optimization [41]- [43], self-modeling 112 [44], adaptive CPGs [45]- [47], and machine learning 113 techniques [21], [47]- [52]. For this study, we apply 114 Bayesian optimization [53], [54] to minimize the amount is coupled through mechanical coupling.…”

mentioning

confidence: 99%

“…While optimization and learning in simulation are efficient and cheap, the transfer of control policies can be difficult due to the sim2real gap [20], [21], [52]. We examine the transferability of our approach by quantifying the sim2real gap comparing simulation and hardware experiments.…”

mentioning

confidence: 99%

Learning Neuroplastic Matching of Robot Dynamics in Closed-loop CPGs

Ruppert

Badri-Spröwitz

2021

Preprint

View full text Add to dashboard Cite

Legged robots have the potential to show locomotion performance with reduced control effort and energy efficiency by leveraging elastic structures inspired by animals' elastic tendons and muscles. However, it remains a challenge to match the natural dynamics of complex legged robots and their control task dynamics. Here we present a framework to match control task dynamics and natural dynamics based on the neuroelasticity and neuroplasticity concept. Inspired by animals we design quadruped robot Morti with strong natural dynamics as a testing platform. It is controlled through a bioinspired closed-loop central pattern generator (CPG) that is designed to neuroelastically mitigate short term perturbations using sparse contact feedback. We use the amount of neuroelastic activity as a proxy to quantify the dynamics' mismatching. By minimizing neuroelastic activity, we neuroplastically match the control task dynamics to the robot's natural dynamics. Through matching the robot learns to walk within one hour with only sparse feedback and improves its energy efficiency without explicitly minimizing it in the cost function.

show abstract

Bridging the Reality Gap via Progressive Bayesian Optimisation

Chen

Rosendo

2022

Robotics in Natural Settings

View full text Add to dashboard Cite

Learning Memory-Based Control for Human-Scale Bipedal Locomotion

Cited by 62 publications

References 14 publications

Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning

Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning

Learning Neuroplastic Matching of Robot Dynamics in Closed-loop CPGs

Bridging the Reality Gap via Progressive Bayesian Optimisation

Contact Info

Product

Resources

About