2022
DOI: 10.1007/s11334-022-00480-4
|View full text |Cite
|
Sign up to set email alerts
|

Online shielding for reinforcement learning

Abstract: Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given, and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Online shielding lacks worst-case computation time guarantees, potentially allowing the agent to reach the next decision state before the shield determines which action to block. It is suitable in scenarios where alternative actions, like "waiting," can be taken if safety analysis is not completed promptly [43].…”
Section: Reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Online shielding lacks worst-case computation time guarantees, potentially allowing the agent to reach the next decision state before the shield determines which action to block. It is suitable in scenarios where alternative actions, like "waiting," can be taken if safety analysis is not completed promptly [43].…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…In contrast, the shielding technique [37] deploys a shield to directly forestall the agent from taking actions that might potentially breach safety regulations during the exploration phase of DRL [38]. However, the shield's strictness may sometimes hinder the learning agent's ability to effectively explore the environment and discover its optimal policy [43].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation