2020
DOI: 10.48550/arxiv.2007.00691
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Falsification-Based Robust Adversarial Reinforcement Learning

Abstract: Reinforcement learning (RL) has achieved tremendous progress in solving various sequential decision-making problems, e.g., control tasks in robotics. However, RL methods often fail to generalize to safety-critical scenarios since policies are overfitted to training environments. Previously, robust adversarial reinforcement learning (RARL) was proposed to train an adversarial network that applies disturbances to a system, which improves robustness in test scenarios. A drawback of neural-network-based adversarie… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 39 publications
0
2
0
Order By: Relevance
“…These adversarial scenarios are valuable for understanding the shortcomings of the controller at early design stages, which may be hard to expose by random simulations. Moreover, once found, these adversarial scenarios can be used to improve the design, e.g., see [19], [21], [43].…”
Section: Introductionmentioning
confidence: 99%
“…These adversarial scenarios are valuable for understanding the shortcomings of the controller at early design stages, which may be hard to expose by random simulations. Moreover, once found, these adversarial scenarios can be used to improve the design, e.g., see [19], [21], [43].…”
Section: Introductionmentioning
confidence: 99%
“…Uesato et al (2019) use previous versions of a system to train a failure classifier that predicts which initial conditions of a system will lead to failure, but their approach is not applicable to sequential decision making problems of the type we consider. Wang, Nair, and Althoff (2020) alternately train an agent and perform safety validation on it to improve robustness. On each iteration, the safety validation algorithm starts with the parameters from the previous iteration to improve efficiency.…”
Section: Introductionmentioning
confidence: 99%