2019 IEEE/ACM 2nd International Workshop on Robotics Software Engineering (RoSE) 2019
DOI: 10.1109/rose.2019.00011
|View full text |Cite
|
Sign up to set email alerts
|

A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…Also, integrated monitoring of TSCs that characterize proper driving or critical traffic scenarios could be used to auto-trigger switching of automated driving levels, provide warning to the driver or to the driving function developers. Additionally, our concept can also help to improve the efficiency of AI driving function training, in particular for Active Learning and Reinforcement Learning [21], e.g., by supplying an additional criterion for early termination of training runs. Last but not least, the above application cases are not limited to only the automotive domain.…”
Section: Discussionmentioning
confidence: 99%
“…Also, integrated monitoring of TSCs that characterize proper driving or critical traffic scenarios could be used to auto-trigger switching of automated driving levels, provide warning to the driver or to the driving function developers. Additionally, our concept can also help to improve the efficiency of AI driving function training, in particular for Active Learning and Reinforcement Learning [21], e.g., by supplying an additional criterion for early termination of training runs. Last but not least, the above application cases are not limited to only the automotive domain.…”
Section: Discussionmentioning
confidence: 99%
“…Safety: The core methodology of reinforcement learning is trial-and-error, i.e., accumulating experience through randomly taking actions to improve the quality of policy, which may lead the self-learning adaptive system to unsafe states [33]. A preliminary study on this problem has been conducted [34]- [36], but most of the SLASs developed so far do not have an effective methodology to resolve the problem. Thrashing: When a violation (referring to the difference between the offline assumptions and the real environmentsystem dynamics) is detected, MeRAP goes back to the meta policy and uses it to re-plan the policy.…”
Section: Discussionmentioning
confidence: 99%
“…ii) Monitoring based on inconsistencies during inference: These methods focus on detecting inconsistencies at runtime to avoid the robot making catastrophic decisions when deployed in a new environment. In Mallozzi et al (2019), they propose a method to enforce certain properties (including any safety-critical requirements) which they call invariants that the agent has to respect all the time while exploring complex partially observable environments using reinforcement learning. Their method, called WiseML, acts as a safety envelope over any existing reinforcement learning algorithms and prevents the agents from taking actions that violate the specified invariants.…”
Section: A Online Methodsmentioning
confidence: 99%