2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8794293
|View full text |Cite
|
Sign up to set email alerts
|

Risk Averse Robust Adversarial Reinforcement Learning

Abstract: Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
34
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 63 publications
(36 citation statements)
references
References 18 publications
2
34
0
Order By: Relevance
“…The computation of the subgradient in (30) requires the exact value of J c (X k ), which cannot be obtained in the sample-based setting. Thus, we estimate it by…”
Section: B a Sample-based Primal-dual Algorithmmentioning
confidence: 99%
“…The computation of the subgradient in (30) requires the exact value of J c (X k ), which cannot be obtained in the sample-based setting. Thus, we estimate it by…”
Section: B a Sample-based Primal-dual Algorithmmentioning
confidence: 99%
“…Theorem 2: (weighted sup-norm bound) Let V * be the approximate value function solution to (36), and V * be the solution to (9). Then,…”
Section: B One-shot Semi-infinite-dimensional Convex Programmentioning
confidence: 99%
“…On the other hand, entropic risk measures leverage exponential cost functions to simultaneously optimize the average cost and its variance [33]- [34]. To account for epistemic uncertainties, policy gradient (PG) method [35], as an RL algorithm, has been leveraged to learn the solution to value-at-risk setting [36]- [39] and exponential utility setting [34]. PG algorithms explicitly parameterize the policy and update the control parameters in the direction of the gradient of the performance.…”
Section: Introductionmentioning
confidence: 99%
“…Robust planning in RL Robustness in RL has been heavily studied, both in the context of robust adversarial RL [Pinto et al, 2017, Pan et al, 2019, Zhang et al, 2020a and nonstationarity in multi-agent RL settings [Li et al, 2019, Zhang et al, 2020b. For example, PSRO extends double oracle from state-independent pure strategies to policy-space strategies to be used for multiplayer competitive games [Lanctot et al, 2017].…”
Section: Related Workmentioning
confidence: 99%