Bipedal locomotion skills are challenging to develop. Control strategies often use local linearization of the dynamics in conjunction with reduced-order abstractions to yield tractable solutions. In these model-based control strategies, the controller is often not fully aware of many details, including torque limits, joint limits, and other non-linearities that are necessarily excluded from the control computations for simplicity. Deep reinforcement learning (DRL) offers a promising model-free approach for controlling bipedal locomotion which can more fully exploit the dynamics. However, current results in the machine learning literature are often based on ad-hoc simulation models that are not based on corresponding hardware. Thus it remains unclear how well DRL will succeed on realizable bipedal robots. In this paper, we demonstrate the effectiveness of DRL using a realistic model of Cassie, a bipedal robot. By formulating a feedback control problem as finding the optimal policy for a Markov Decision Process, we are able to learn robust walking controllers that imitate a reference motion with DRL. Controllers for different walking speeds are learned by imitating simple timescaled versions of the original reference motion. Controller robustness is demonstrated through several challenging tests, including sensory delay, walking blindly on irregular terrain and unexpected pushes at the pelvis. We also show we can interpolate between individual policies and that robustness can be improved with an interpolated policy.
Abstract-We apply fast online trajectory optimization for multi-step motion planning to Cassie, a bipedal robot designed to exploit natural spring-mass locomotion dynamics using lightweight, compliant legs. Our motion planning formulation simultaneously optimizes over center of mass motion, footholds, and center of pressure for a simplified model that combines transverse linear inverted pendulum and vertical spring dynamics. A vertex-based representation of the support area combined with this simplified dynamic model that allows closed form integration leads to a fast nonlinear programming problem formulation. This optimization problem is continuously solved online in a model predictive control approach. The output of the reduced-order planner is fed into a quadratic programming based operational space controller for execution on the full-order system. We present simulation results showing the performance and robustness to disturbances of the planning and control framework. Preliminary results on the physical robot show functionality of the operational space control system, with integration of the trajectory planner a work in progress.
Despite enhancements in the development of robotic systems, the energy economy of today's robots lags far behind that of biological systems. This is in particular critical for untethered legged robot locomotion. To elucidate the current stage of energy efficiency in legged robotic systems, this paper provides an overview on recent advancements in development of such platforms. The covered different perspectives include actuation, leg structure, control and locomotion principles. We review various robotic actuators exploiting compliance in series and in parallel with the drive-train to permit energy recycling during locomotion. We discuss the importance of limb segmentation under efficiency aspects and with respect to design, dynamics analysis and control of legged robots. This paper also reviews a number of control approaches allowing for energy efficient locomotion of robots by exploiting the natural dynamics of the system, and by utilizing optimal control approaches targeting locomotion expenditure. To this end, a set of locomotion principles elaborating on models for energetics, dynamics, and of the systems is studied.
B iological bipeds have long been thought to take advantage of compliance and passive dynamics to walk and run, but realizing robotic locomotion in this fashion has been difficult in practice. Assume The Robot Is A Sphere (ATRIAS) is a bipedal robot designed to take advantage of the inherent stabilizing effects that emerge as a result of tuned mechanical com pliance (Table 1). In this article, we describe the mechanics of the biped and how our controller exploits the interplay bet ween passive dynamics and actuation to achieve robust locomotion. We outline our development process for the incremental design and testing of our controllers through rapid iteration. By show time at the Defense Advanced Research Pro jects Agency (DARPA) Robotics Challenge (Figure 1), ATRIAS was able to walk with robustness, locomote in ter rain from asphalt to grass to artificial turf, and traverse changes in surface height as large as 15 cm without planning or visual feedback. Furthermore, ATRIAS can accelerate from rest, transition smoothly to a running gait, and reach a top speed of 2.5 m/s (9 km/h). Reliably achieving such dynamic locomotion in an uncertain environment required rigorous development and testing of the hardware, software, and control algorithms. This endeavor culminated in seven live shows of ATRIAS walking and running, with disturbances and without falling, in front of a live audience at the DARPA Robotics Challenge. Approaches to Biped Control Walking and running on two legs is an enduring challenge in robotics. Avoiding falls becomes especially tricky when the terrain is uncertain in both its geometry and rigidity. A promising approach to achieving stable control is to relin quish some authority to purposeful passive dynamics, per haps by adding mechanical compliance [1] or removing actuators entirely [2]. If the machine's unactuated dynam ics are thoughtfully designed, they can passively attenuate disturbances and require smaller adjustments from the controller [33].
Deep reinforcement learning (DRL) is a promising approach for developing legged locomotion skills. However, the iterative design process that is inevitable in practice is poorly supported by the default methodology. It is difficult to predict the outcomes of changes made to the reward functions, policy architectures, and the set of tasks being trained on. In this paper, we propose a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration. We characterize policies via sets of Deterministic Action Stochastic State (DASS) tuples, which represent the deterministic policy state-action pairs as sampled from the states visited by the trained stochastic policy. New policies are trained using a policy gradient algorithm which then mixes RL-based policy gradients with gradient updates defined by the DASS tuples. The tuples also allow for robust policy distillation to new network architectures. We demonstrate the effectiveness of this iterative-design approach on the bipedal robot Cassie, achieving stable walking with different gait styles at various speeds. We demonstrate the successful transfer of policies learned in simulation to the physical robot without any dynamics randomization, and that variable-speed walking policies for the physical robot can be represented by a small dataset of 5-10k tuples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.