Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward SDEs

Pereira, Marcus A.; Wang, Ziyi; Theodorou, Evangelos A.

doi:10.48550/arxiv.2009.01196

Cited by 4 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The path-integral can then be solved using Monte Carlo sampling to obtain the optimal state and action sequence. Recently, this approach has been used in combination with deep networks [228]- [230]. State-space-based methods solve the HJB globally to obtain an optimal non-linear controller applicable on the complete state domain.…”

Section: Continuous-time Reinforcementmentioning

confidence: 99%

Inductive Biases in Machine Learning for Robotics and Control

Lutter

2023

Springer Tracts in Advanced Robotics

View full text Add to dashboard Cite

A fundamental problem of robotics is how can one program a robot to perform a task with its limited embodiment? Classical robotics solves this problem by carefully engineering interconnected modules. The main disadvantage is that this approach is labor-intensive and becomes close to impossible for unstructured environments and observations. Instead of manual engineering, one can solely use black-box models and data. In this paradigm, interconnected deep networks replace all modules of classical robotics. The network parameters are learned using reinforcement learning or self-supervised losses that predict the future.In this thesis, we want to show that these two approaches of classical engineering and black-box deep networks are not mutually exclusive. One can transfer insights from classical robotics to the black box deep networks and obtain better learning algorithms for robotics and control. To show that incorporating existing knowledge as inductive biases in machine learning algorithms can improve performance, we present three different algorithms: (1) The Differentiable Newton Euler Algorithm (Diff NEA) reinterprets the classical system identification of rigid bodies. By leveraging automatic differentiation, virtual parameters, and gradient-based optimization, this approach guarantees physically consistent parameters and applies to a wider class of dynamical systems. (2) Deep Lagrangian Networks (DeLaN) combines deep networks with Lagrangian mechanics to learn dynamics models that conserve energy. Using two networks to represent the potential and kinetic energy enables the computation of a physically plausible dynamics model using the Euler-Lagrange equation. (3) Robust Fitted Value Iteration (rFVI) leverages the control-affine dynamics of mechanical systems to extend value iteration to the adversarial reinforcement learning with continuous actions. The resulting approach enables the computation of the optimal policy that is robust to changes in the dynamics.Each of these algorithms is evaluated on physical systems and compared to the classical engineering and deep learning baselines. The experiments show that the inductive biases increase performance compared to black-box deep learning approaches. Diff NEA solves Ball-in-Cup on the physical Barrett WAM using offline model-based reinforcement learning and only four minutes of data. The deep networks models fail on this task despite using v vi• Jan Peters for being my supervisor. You cheered me up during the valleys, helped me celebrate the highs, always covered my back, increased my intrinsic motivation, and provided an excellent environment for me to complete my thesis. Without you, I could not have completed most of my goals for my thesis.• Russ Tedrake for agreeing to examine my thesis as well as the support of the other committee members, Kristian Kersting, Oskar van Stryk, and Stefan Roth.

show abstract

Section: Continuous-time Reinforcementmentioning

confidence: 99%

Inductive Biases in Machine Learning for Robotics and Control

Lutter

2023

Springer Tracts in Advanced Robotics

View full text Add to dashboard Cite

show abstract

“…These methods can be divided into trajectory and state-space based methods. Trajectory based methods solve the stochastic HJB along a trajectory using path integral control [47][48][49] or forward-backward stochastic differential equations [50,51]. State-space based methods solve the HJB globally to obtain a optimal non-linear controller applicable on the complete state domain.…”

Section: Continuous-time Reinforcement Learningmentioning

confidence: 99%

Robust Value Iteration for Continuous Control Tasks

Lutter

Mannor

Peters

et al. 2021

Robotics: Science and Systems XVII

View full text Add to dashboard Cite

When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding statedistribution, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuoustime perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

show abstract

“…The path-integral can then be solved using Monte Carlo sampling to obtain the optimal state and action sequence. Recently, this approach has been used in combination with deep networks [75]- [77]. State-space-based methods solve the HJB globally to obtain an optimal non-linear controller applicable on the complete state domain.…”

Section: Related Workmentioning

confidence: 99%

Continuous-Time Fitted Value Iteration for Robust Policies

Lutter¹,

Belousov²,

Mannor³

et al. 2021

Preprint

View full text Add to dashboard Cite

Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

show abstract

Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward SDEs

Cited by 4 publications

References 15 publications

Inductive Biases in Machine Learning for Robotics and Control

Inductive Biases in Machine Learning for Robotics and Control

Robust Value Iteration for Continuous Control Tasks

Continuous-Time Fitted Value Iteration for Robust Policies

Contact Info

Product

Resources

About