2018
DOI: 10.1609/aaai.v32i1.11797
|View full text |Cite
|
Sign up to set email alerts
|

Safe Reinforcement Learning via Shielding

Abstract: Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield monitors the actions from the learner and corrects them only if the chosen action caus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
173
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 385 publications
(174 citation statements)
references
References 12 publications
1
173
0
Order By: Relevance
“…A promising avenue is to extend single-agent methods for safe reinforcement learning (e.g. shielding [90]) and offline evolution with safe online adaptation (e.g. map-based constrained Bayesian optimisation [83]) to the decentralised multi-agent setting.…”
Section: Discussionmentioning
confidence: 99%
“…A promising avenue is to extend single-agent methods for safe reinforcement learning (e.g. shielding [90]) and offline evolution with safe online adaptation (e.g. map-based constrained Bayesian optimisation [83]) to the decentralised multi-agent setting.…”
Section: Discussionmentioning
confidence: 99%
“…Thus, safe RL has great potential to conclude a model-free policy that could provide exploration for unknown system dynamics (Garcıa & Fernández, 2015). Combined with classic control techniques like formal methods (Alshiekh et al, 2018) and Model Predictive Control (MPC) (Zanon & Gros, 2020), safe policy is synthesised with theoretical guarantees. To maintain the property of forward invariance in a safe set, CBF provides a manner to obtain provably and scalable collision-free behaviours.…”
Section: Autonomous Drivingmentioning
confidence: 99%
“…One approach is to integrate adaptive control with standard machine learning methods, such as NN [63], GP [25] and DNN [17,33,37]. The safety properties are usually considered on the whole system with some parts being learned in MPC problems [9,36], shielding [2], Control Barrier Functions (CBFs) [14], Hamiltonian analysis [5]. These approaches can guarantee safety for tasks such as stabilization [34,67] and tracking [22,49].…”
Section: A Related Work 1) Safety and Learningmentioning
confidence: 99%
“…Safe exploration [47] and safe optimization [60] of MDPs under unknown or selected cost functions can be formulated. Constrained MDPs are also common in RL tasks with Lagrangian methods [18] and generalized Lyapunov/barrier functions [14,19,21] and shielding [2]. Nonetheless much of the work remains confined to rather naive simulated tasks, such as moving a 2D agent on a grid map.…”
Section: A Related Work 1) Safety and Learningmentioning
confidence: 99%
See 1 more Smart Citation