2023
DOI: 10.1109/ojcsys.2023.3256305
|View full text |Cite
|
Sign up to set email alerts
|

Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes

Abstract: While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 59 publications
0
24
0
Order By: Relevance
“…This significantly increases the challenge of verifying safe actions because there are infinite individual continuous actions in an continuous action space. One approach could be obtaining rule-compliant state sets as proposed in [58] and correcting actions proposed by the agent to safe actions, e.g., with action projection as in [59].…”
Section: B Discussionmentioning
confidence: 99%
“…This significantly increases the challenge of verifying safe actions because there are infinite individual continuous actions in an continuous action space. One approach could be obtaining rule-compliant state sets as proposed in [58] and correcting actions proposed by the agent to safe actions, e.g., with action projection as in [59].…”
Section: B Discussionmentioning
confidence: 99%
“…On the contrary, set-based predictions using reachability analysis [54] provide an over-approximation of all feasible future movements of other traffic participants that adhere to traffic rules; thus, safety can be guaranteed. Among all previous works, the closest to ours are [11], [42]. Kochdumper et al [42] propose a reachability-based safety shield for RL controllers for general nonlinear systems.…”
Section: A Literature Reviewmentioning
confidence: 99%
“…For example, α,β-Crown was the top performer on last year's NN verification competition [55], able to verify FFNN, CNN and SSNNs, but it lacks support for neural ODEs or NNCS. There exist other tools that focus more on the verification of NNCS such as Verisig [34,35], Juliareach [63], ReachNN [17,33], Sherlock [16], RINO [26], VenMas [1], POLAR [32], and CORA [3,42]. However, their support is limited to NNCS with a linear, nonlinear ODE or hybrid automata as the plant model, and a FFNN as the controller.…”
Section: Related Workmentioning
confidence: 99%