2019
DOI: 10.1007/978-3-030-17462-0_28
|View full text |Cite
|
Sign up to set email alerts
|

Verifiably Safe Off-Model Reinforcement Learning

Abstract: The desire to use reinforcement learning in safety-critical settings has inspired a recent interest in formal methods for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained learning or constrained optimization. Given a single correct model and associated safety constraint, these approaches guarantee efficient learning while provably avoiding behaviors outside the safety constraint. Acting well given an accurate environmental model is an impo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 42 publications
(28 citation statements)
references
References 27 publications
0
28
0
Order By: Relevance
“…In other learning-enabled systems, falsification and testing-based approaches [12,13,45] have shown a significant promise in enhancing the safety of systems where perception components and neural networks interact with the physical world. Finally, there is significant related work in the domain of safe reinforcement learning [2,15,47,59], and combining guarantees from NNV with those provided in these methods would be interesting to explore.…”
Section: Related Workmentioning
confidence: 99%
“…In other learning-enabled systems, falsification and testing-based approaches [12,13,45] have shown a significant promise in enhancing the safety of systems where perception components and neural networks interact with the physical world. Finally, there is significant related work in the domain of safe reinforcement learning [2,15,47,59], and combining guarantees from NNV with those provided in these methods would be interesting to explore.…”
Section: Related Workmentioning
confidence: 99%
“…In that case do quantitative versions of ModelPlex monitors serve as reward signals that tend to pull the system back from unsafe into safe space. Beyond the experimental observation that this enables a safe recovery outside well-modeled parts of the world is it possible to give rigorous safety proofs of the resulting behavior of the learning CPS for the case of multiple possible models that are not all wrong or that can be modified to safely fit reality with verification-preserving model updates [Fulton and Platzer, 2019]. The basic idea is to use the conjunction of all ModelPlex monitors of plausible models to determine which action is safe while discarding models whose predictions did not end up happening.…”
Section: Safe Learning In Cpsmentioning
confidence: 99%
“…KeYmaera X implements the transfer of safety proofs for CPS models to CPS implementations by synthesizing provably correct runtime monitors with Model-Plex [Mitsch and Platzer, 2016b], which result in CPS executables that are formally verified in a chain of theorem provers [Bohrer et al, 2018]. ModelPlex is also the basis for enabling safe artificial intelligence in cyber-physical systems [Platzer, 2019a] by wrapping reinforcement learning in a verified safety sandbox [Fulton and Platzer, 2018] and steering back toward safety when outside the confounds of a well-modeled part of the system behavior [Fulton and Platzer, 2019]. These results that will be surveyed here provide rigorous safety guarantees for CPS implementations (even those that involve machine learning) without having to deal with the entire complexity of their implementation during verification.…”
Section: Introductionmentioning
confidence: 99%
“…The diver's oxygen consumption will depend upon both heart rate and several user-specific parameters (b, hr ss , and τ ) fit using data from experiments [4,7]. Future work will incorporate system identification into this model using [2] and will evaluate a prototype based on our model.…”
Section: The Modelmentioning
confidence: 99%