WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Yang, Qisong; Simão, Thiago D.; Tindemans, Simon H.; Spaan, Matthijs T. J.

doi:10.1609/aaai.v35i12.17272

Cited by 52 publications

(55 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where we denote 𝜃 as the policy parameters. Alternating between the maximizing over 𝜃 via any unconstrained reinforcement learning algorithms and minimizing over the Lagrange multiplier 𝜆 yields a series of Lagrangian-based methods to solve the safe deployment problem [208]. Chow et al [31] propose PDO to update both primal parameters and dual variables by performing gradient descent based on on-policy estimations of the reward and cost value functions 𝑉 𝜋 𝜃 𝑟 (𝜇 0 ) and 𝐽 𝑐 (𝜋 𝜃 ).…”

Section: Primal-dual-based Methodsmentioning

confidence: 99%

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Xu¹,

Liu²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including robustly handling uncertainties, satisfying safety constraints to avoid catastrophic failures, and generalizing to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning. CCS Concepts: • Computing methodologies → Reinforcement learning; Markov decision processes; • Security and privacy → Social aspects of security and privacy; • Computer systems organization → Robotics; • Hardware → Safety critical systems.

show abstract

Section: Primal-dual-based Methodsmentioning

confidence: 99%

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Xu¹,

Liu²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Safe RL. Constrained optimization techniques are usually adopted to solve safe RL problems (Garcıa & Fernández, 2015;Sootla et al, 2022;Yang et al, 2021;Flet-Berliac & Basu, 2022). Lagrangian-based methods use a multiplier to penalize constraint violations (Chow et al, 2017;Tessler et al, 2018;Stooke et al, 2020;Chen et al, 2021b).…”

Section: Related Workmentioning

confidence: 99%

Multi-task safe reinforcement learning for navigating intersections in dense traffic

Liu

Gao

Zhang

et al. 2023

Journal of the Franklin Institute

View full text Add to dashboard Cite

“…The Lagrangian method [20] is a popular way to address constrained RL problems by converting them to a dual problem, following constrained optimization theory [23,Chapter 5] and optimizing the Lagrangian multiplier in conjunction with the RL policy. More recent works, such as constrained policy optimization [24], constrained RL with a PID-controlled Lagrange multiplier (PID-Lagrangian) [25], and worst-case soft actor-critic [26] build on the Lagrangian method and make it applicable to deep RL. A drawback of constrained RL is that it cannot guarantee safety in a provable way, as the learned behavior is not formalized or proven.…”

Section: Related Workmentioning

confidence: 99%

Reducing Safety Interventions in Provably Safe Reinforcement Learning

Thumm¹,

Pelat²,

Althoff³

2023

Preprint

View full text Add to dashboard Cite

Deep Reinforcement Learning (RL) has shown promise in addressing complex robotic challenges. In real-world applications, RL is often accompanied by failsafe controllers as a last resort to avoid catastrophic events. While necessary for safety, these interventions can result in undesirable behaviors, such as abrupt braking or aggressive steering. This paper proposes two safety intervention reduction methods: action replacement and projection, which change the agent's action if it leads to an unsafe state. These approaches are compared to the state-of-the-art constrained RL on the OpenAI safety gym benchmark and a human-robot collaboration task. Our study demonstrates that the combination of our method with provably safe RL leads to high-performing policies with zero safety violations and a low number of failsafe interventions. Our versatile method can be applied to a wide range of realworld robotics tasks, while effectively improving safety without sacrificing task performance.1 https://youtu.be/dIvhyV5z8bM

show abstract

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Cited by 52 publications

References 22 publications

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Multi-task safe reinforcement learning for navigating intersections in dense traffic

Reducing Safety Interventions in Provably Safe Reinforcement Learning

Contact Info

Product

Resources

About