Robotics: Science and Systems XVI 2020
DOI: 10.15607/rss.2020.xvi.031
|View full text |Cite
|
Sign up to set email alerts
|

Learning Memory-Based Control for Human-Scale Bipedal Locomotion

Abstract: Controlling a non-statically stable biped is a difficult problem largely due to the complex hybrid dynamics involved. Recent work has demonstrated the effectiveness of reinforcement learning (RL) for simulation-based training of neural network controllers that successfully transfer to real bipeds. The existing work, however, has primarily used simple memoryless network architectures, even though more sophisticated architectures, such as those including memory, often yield superior performance in other RL domai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
34
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 62 publications
(35 citation statements)
references
References 14 publications
1
34
0
Order By: Relevance
“…We represent the control policy as an LSTM recurrent neural network [17], with two recurrent hidden layers of dimension 128 each. We opt to use a memory-enabled network because of previous work demonstrating a higher degree of proficiency in handling partially observable environments [18] [16] [19]. For ablation experiments, involving non-memory-based control policies, we use a standard feedforward neural network with two layers of dimension 300, with tanh activation functions, such that the number of parameters is approximately equal to that of the LSTM network.…”
Section: E Policy Representation and Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…We represent the control policy as an LSTM recurrent neural network [17], with two recurrent hidden layers of dimension 128 each. We opt to use a memory-enabled network because of previous work demonstrating a higher degree of proficiency in handling partially observable environments [18] [16] [19]. For ablation experiments, involving non-memory-based control policies, we use a standard feedforward neural network with two layers of dimension 300, with tanh activation functions, such that the number of parameters is approximately equal to that of the LSTM network.…”
Section: E Policy Representation and Learningmentioning
confidence: 99%
“…During training, we make use of a mirror loss term [21] in order to ensure that the control policy does not learn asymmetric gaits. For recurrent policies, we sample batches of episodes from a replay buffer as in [19], while for feedforward policies we sample batches of timesteps. Each episode is limited to be 300 timesteps, which corresponds to about 7.5 seconds of simulation time.…”
Section: E Policy Representation and Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…To quantify the matching of the robot's natural dynamics and the control task dynamics we use the neuroelastic activity as a proxy. If the dynamics do not To optimize and tune the control tasks, different methods 111 have been used such as optimization [41]- [43], self-modeling 112 [44], adaptive CPGs [45]- [47], and machine learning 113 techniques [21], [47]- [52]. For this study, we apply 114 Bayesian optimization [53], [54] to minimize the amount is coupled through mechanical coupling.…”
mentioning
confidence: 99%
“…While optimization and learning in simulation are efficient and cheap, the transfer of control policies can be difficult due to the sim2real gap [20], [21], [52]. We examine the transferability of our approach by quantifying the sim2real gap comparing simulation and hardware experiments.…”
mentioning
confidence: 99%