Robust Reinforcement Learning using Adversarial Populations

Vinitsky, Eugene; Du, Youwei; Parvate, Kanaad; Jang, Kathy; Abbeel, Pieter; Bayen, Alexandre M.

doi:10.48550/arxiv.2008.01825

Cited by 14 publications

(21 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This can limit the applicability of robust MDPs beyond toy problems. Recent work (36,37,38) applied deep RL to robust decision making, targeting key theoretical and practical hurdles such as (i) how to effectively model uncertainty with deep neural networks, and (ii) how to efficiently solve the min-max optimization (e.g., via sampling or two-player, game-theoretic formulations). These ideas, including adversarial RL and domain randomization, are presented in Sec.…”

Section: M9 Neuralmentioning

confidence: 99%

“…This method learns an ensemble of Deep Q-Networks (DQN) (103) and defines the risk of an action based on the variance of its value predictions. In another extension of ( 36), a population of adversaries (rather than a single one) is trained (38), leading to the resulting protagonist being less exploitable by new adversaries. Finally, the work in (104) proposes certified lower bounds for the value predictions from a DQN (103), given bounded observation perturbations.…”

Section: M20 Lagrangianmentioning

confidence: 99%

“…Under the assumptions of a known bound D(x, u) on the dynamics f and a distance measure to the state constraints Xc, the backup controller is used to obtain a future state in the Numerical Examples & Grid-worlds (47,39,52,53,58,54,55,56,61,60,51,88,78,97,73,69,82,99,90,70,80,86,137,110,139) Robot Sim. & Physicsbased RL Environments (64,65,66,40,44,43,142,68,34,36,38,91,85,87,93,37,104,107,123,125,126,130,131,128,…”

Section: M25 Terminal Safementioning

confidence: 99%

See 2 more Smart Citations

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Brunke¹,

Greeff²,

Hall³

et al. 2021

Preprint

View full text Add to dashboard Cite

The last half-decade has seen a steep rise in the number of contributions on safe learning methods for real-world robotic deployments from both the control and reinforcement learning communities. This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision making under uncertainties, with a focus on unifying the language and frameworks used in control theory and reinforcement learning research. Our review includes: learning-based control approaches that safely improve performance by learning the uncertain dynamics, reinforcement learning approaches that encourage safety or robustness, and methods that can formally certify the safety of a learned control policy. As dataand learning-based robot control methods continue to gain traction, researchers must understand when and how to best leverage them in real-world scenarios where safety is imperative, such as when operating in close proximity to humans. We highlight some of the open challenges that will drive the field of robot learning in the coming years, and emphasize the need for realistic physics-based benchmarks to facilitate fair comparisons between control and reinforcement learning approaches.

show abstract

Section: M9 Neuralmentioning

confidence: 99%

Section: M20 Lagrangianmentioning

confidence: 99%

Section: M25 Terminal Safementioning

confidence: 99%

See 1 more Smart Citation

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Brunke¹,

Greeff²,

Hall³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It was shown in [Iyengar, 2005] that the robust MDP problem is equivalent to a zero-sum game between the agent and the nature. Motivated by this fact, the adversarial training approach, where an adversary perturbs the state transition, was studied in Vinitsky et al [2020], Pinto et al [2017], Abdullah et al [2019], Hou et al [2020], Rajeswaran et al [2016], Atkeson and Morimoto [2003], Morimoto and Doya [2005]. This method relies on a simulator, where the state transition can be modified in an arbitrary way.…”

Section: Related Workmentioning

confidence: 99%

Online Robust Reinforcement Learning with Model Uncertainty

Wang¹,

Zou²

2021

Preprint

View full text Add to dashboard Cite

Robust reinforcement learning (RL) is to find a policy that optimizes the worstcase performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially, and is assumed to be unknown. We develop a sample-based approach to estimate the unknown uncertainty set, and design robust Q-learning algorithm (tabular case) and robust TDC algorithm (function approximation setting), which can be implemented in an online and incremental fashion. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges asymptotically to some stationary points. Unlike the results in [Roy et al., 2017], our algorithms do not need any additional conditions on the discount factor to guarantee the convergence. We further characterize the finite-time error bounds of the two algorithms, and show that both the robust Qlearning and robust TDC algorithms converge as fast as their vanilla counterparts (within a constant factor). Our numerical experiments further demonstrate the robustness of our algorithms. Our approach can be readily extended to robustify many other algorithms, e.g., TD, SARSA, and other GTD algorithms.

show abstract

“…Learning is at the core of many modern information systems, with wide-ranging applications in clinical research [1][2][3][4], smart grids [5][6][7], and robotics [8][9][10]. However, it has become clear that learning-based solutions suffer from a critical lack of robustness [11][12][13][14][15][16][17], leading to models that are vulnerable to malicious tampering and unsafe behavior [18][19][20][21][22]. While robustness has been studied in statistics for decades [23][24][25], this issue has been exacerbated by the opacity, scale, and non-convexity of modern learning models, such as convolutional neural network (CNNs).…”

Section: Introductionmentioning

confidence: 99%

Adversarial Robustness with Semi-Infinite Constrained Learning

Robey¹,

Chamon²,

Pappas³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite strong performance in numerous applications, the fragility of deep learning to input perturbations has raised serious questions about its use in safety-critical domains. While adversarial training can mitigate this issue in practice, state-ofthe-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal performance and robustness. Moreover, the problem of finding worst-case perturbations is non-convex and underparameterized, both of which engender a non-favorable optimization landscape. Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works. In this paper, we take a constrained learning approach to address these questions and to provide a theoretical foundation for robust learning. In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely. Notably, we show that a myriad of previous robust training techniques can be recovered for particular, sub-optimal choices of these distributions. Using these insights, we then propose a hybrid Langevin Monte Carlo approach of which several common algorithms (e.g., PGD) are special cases. Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10. Our code is available at: https://github.com/arobey1/advbench.

show abstract

Robust Reinforcement Learning using Adversarial Populations

Cited by 14 publications

References 21 publications

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Online Robust Reinforcement Learning with Model Uncertainty

Adversarial Robustness with Semi-Infinite Constrained Learning

Contact Info

Product

Resources

About