2021
DOI: 10.1609/aaai.v35i12.17272
|View full text |Cite
|
Sign up to set email alerts
|

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Abstract: Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforceme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
55
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(55 citation statements)
references
References 22 publications
0
55
0
Order By: Relevance
“…where we denote 𝜃 as the policy parameters. Alternating between the maximizing over 𝜃 via any unconstrained reinforcement learning algorithms and minimizing over the Lagrange multiplier 𝜆 yields a series of Lagrangian-based methods to solve the safe deployment problem [208]. Chow et al [31] propose PDO to update both primal parameters and dual variables by performing gradient descent based on on-policy estimations of the reward and cost value functions 𝑉 𝜋 𝜃 𝑟 (𝜇 0 ) and 𝐽 𝑐 (𝜋 𝜃 ).…”
Section: Primal-dual-based Methodsmentioning
confidence: 99%
“…where we denote 𝜃 as the policy parameters. Alternating between the maximizing over 𝜃 via any unconstrained reinforcement learning algorithms and minimizing over the Lagrange multiplier 𝜆 yields a series of Lagrangian-based methods to solve the safe deployment problem [208]. Chow et al [31] propose PDO to update both primal parameters and dual variables by performing gradient descent based on on-policy estimations of the reward and cost value functions 𝑉 𝜋 𝜃 𝑟 (𝜇 0 ) and 𝐽 𝑐 (𝜋 𝜃 ).…”
Section: Primal-dual-based Methodsmentioning
confidence: 99%
“…Safe RL. Constrained optimization techniques are usually adopted to solve safe RL problems (Garcıa & Fernández, 2015;Sootla et al, 2022;Yang et al, 2021;Flet-Berliac & Basu, 2022). Lagrangian-based methods use a multiplier to penalize constraint violations (Chow et al, 2017;Tessler et al, 2018;Stooke et al, 2020;Chen et al, 2021b).…”
Section: Related Workmentioning
confidence: 99%
“…The Lagrangian method [20] is a popular way to address constrained RL problems by converting them to a dual problem, following constrained optimization theory [23,Chapter 5] and optimizing the Lagrangian multiplier in conjunction with the RL policy. More recent works, such as constrained policy optimization [24], constrained RL with a PID-controlled Lagrange multiplier (PID-Lagrangian) [25], and worst-case soft actor-critic [26] build on the Lagrangian method and make it applicable to deep RL. A drawback of constrained RL is that it cannot guarantee safety in a provable way, as the learned behavior is not formalized or proven.…”
Section: Related Workmentioning
confidence: 99%