Protective Policy Transfer

Yu, Wenhao; Liu, C. Karen; Turk, Greg

doi:10.48550/arxiv.2012.06662

Cited by 1 publication

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We formulate the problem of safe locomotion learning in the context of safe RL. Inspired by prior work [5], [6], [7], Catwalk Two-leg balance Fig. 1: We evaluate our algorithm in legged locomotion tasks: catwalk and two-leg balance.…”

Section: Introductionmentioning

confidence: 99%

“…Different from prior methods that learn a safety critic function which predicts the possibility of safe violations [5], [6], [7], we propose a model-based approach to determine when to switch between the two policies based on the knowledge about the system dynamics. In reality, it is often the case that the designer has some knowledge of the system dynamics at hand.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Safe Reinforcement Learning for Legged Locomotion

Yang¹,

Zhang²,

Luu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Designing control policies for legged locomotion is complex due to the under-actuated and non-continuous robot dynamics. Model-free reinforcement learning provides promising tools to tackle this challenge. However, a major bottleneck of applying model-free reinforcement learning in real world is safety. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy that prevents the robot from entering unsafe states, and a learner policy that is optimized to complete the task. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally intervening in the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in four locomotion tasks on a simulated and real quadrupedal robot: efficient gait, catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods in simulation. When deployed it on realworld quadruped robot, our training pipeline enables 34% improvement in energy efficiency for the efficient gait, 40.9% narrower of the feet placement in the catwalk, and two times more jumping duration in the two-leg balance. Our method achieves less than five falls over the duration of 115 minutes of hardware time. 1

show abstract

Section: Introductionmentioning

confidence: 99%