2020
DOI: 10.48550/arxiv.2008.01825
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Robust Reinforcement Learning using Adversarial Populations

Abstract: Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed. The Robust RL formulation tackles this by adding worst-case adversarial noise to the dynamics and constructing the noise distribution as the solution to a zero-sum minimax game. However, existing work on learning solutions to the Robust RL formulation has primarily focused on training a single RL agent against a single adv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 21 publications
0
21
0
Order By: Relevance
“…This can limit the applicability of robust MDPs beyond toy problems. Recent work (36,37,38) applied deep RL to robust decision making, targeting key theoretical and practical hurdles such as (i) how to effectively model uncertainty with deep neural networks, and (ii) how to efficiently solve the min-max optimization (e.g., via sampling or two-player, game-theoretic formulations). These ideas, including adversarial RL and domain randomization, are presented in Sec.…”
Section: M9 Neuralmentioning
confidence: 99%
See 2 more Smart Citations
“…This can limit the applicability of robust MDPs beyond toy problems. Recent work (36,37,38) applied deep RL to robust decision making, targeting key theoretical and practical hurdles such as (i) how to effectively model uncertainty with deep neural networks, and (ii) how to efficiently solve the min-max optimization (e.g., via sampling or two-player, game-theoretic formulations). These ideas, including adversarial RL and domain randomization, are presented in Sec.…”
Section: M9 Neuralmentioning
confidence: 99%
“…This method learns an ensemble of Deep Q-Networks (DQN) (103) and defines the risk of an action based on the variance of its value predictions. In another extension of ( 36), a population of adversaries (rather than a single one) is trained (38), leading to the resulting protagonist being less exploitable by new adversaries. Finally, the work in (104) proposes certified lower bounds for the value predictions from a DQN (103), given bounded observation perturbations.…”
Section: M20 Lagrangianmentioning
confidence: 99%
See 1 more Smart Citation
“…It was shown in [Iyengar, 2005] that the robust MDP problem is equivalent to a zero-sum game between the agent and the nature. Motivated by this fact, the adversarial training approach, where an adversary perturbs the state transition, was studied in Vinitsky et al [2020], Pinto et al [2017], Abdullah et al [2019], Hou et al [2020], Rajeswaran et al [2016], Atkeson and Morimoto [2003], Morimoto and Doya [2005]. This method relies on a simulator, where the state transition can be modified in an arbitrary way.…”
Section: Related Workmentioning
confidence: 99%
“…Learning is at the core of many modern information systems, with wide-ranging applications in clinical research [1][2][3][4], smart grids [5][6][7], and robotics [8][9][10]. However, it has become clear that learning-based solutions suffer from a critical lack of robustness [11][12][13][14][15][16][17], leading to models that are vulnerable to malicious tampering and unsafe behavior [18][19][20][21][22]. While robustness has been studied in statistics for decades [23][24][25], this issue has been exacerbated by the opacity, scale, and non-convexity of modern learning models, such as convolutional neural network (CNNs).…”
Section: Introductionmentioning
confidence: 99%