A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments

Mallozzi, Piergiuseppe; Castellano, Ezequiel; Schneider, Gerardo; Tei, Kenji

doi:10.1109/rose.2019.00011

Cited by 10 publications

(7 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, integrated monitoring of TSCs that characterize proper driving or critical traffic scenarios could be used to auto-trigger switching of automated driving levels, provide warning to the driver or to the driving function developers. Additionally, our concept can also help to improve the efficiency of AI driving function training, in particular for Active Learning and Reinforcement Learning [21], e.g., by supplying an additional criterion for early termination of training runs. Last but not least, the above application cases are not limited to only the automotive domain.…”

Section: Discussionmentioning

confidence: 99%

Towards Runtime Monitoring of Complex System Requirements for Autonomous Driving Functions

Grundt

Köhne

Saxena

et al. 2022

Electron. Proc. Theor. Comput. Sci.

View full text Add to dashboard Cite

Autonomous driving functions (ADFs) in public traffic have to comply with complex system requirements that are based on knowledge of experts from different disciplines, e.g., lawyers, safety experts, psychologists. In this paper, we present a research preview regarding the validation of ADFs with respect to such requirements. We investigate the suitability of Traffic Sequence Charts (TSCs) for the formalization of such requirements and present a concept for monitoring system compliance during validation runs. We find TSCs, with their intuitive visual syntax over symbols from the traffic domain, to be a promising choice for the collaborative formalization of such requirements. For an example TSC, we describe the construction of a runtime monitor according to our novel concept that exploits the separation of spatial and temporal aspects in TSCs, and successfully apply the monitor on exemplary runs. The monitor continuously provides verdicts at runtime, which is particularly beneficial in ADF validation, where validation runs are expensive. The next open research questions concern the generalization of our monitor construction, the identification of the limits of TSC monitorability, and the investigation of the monitor's performance in practical applications. Perspectively, TSC runtime monitoring could provide a useful technique in other emerging application areas such as AI training, safeguarding ADFs during operation, and gathering meaningful traffic data in the field.

show abstract

Section: Discussionmentioning

confidence: 99%

Towards Runtime Monitoring of Complex System Requirements for Autonomous Driving Functions

Grundt

Köhne

Saxena

et al. 2022

Electron. Proc. Theor. Comput. Sci.

View full text Add to dashboard Cite

show abstract

“…Safety: The core methodology of reinforcement learning is trial-and-error, i.e., accumulating experience through randomly taking actions to improve the quality of policy, which may lead the self-learning adaptive system to unsafe states [33]. A preliminary study on this problem has been conducted [34]- [36], but most of the SLASs developed so far do not have an effective methodology to resolve the problem. Thrashing: When a violation (referring to the difference between the offline assumptions and the real environmentsystem dynamics) is detected, MeRAP goes back to the meta policy and uses it to re-plan the policy.…”

Section: Discussionmentioning

confidence: 99%

A Meta Reinforcement Learning-based Approach for Self-Adaptive System

Zhang¹,

Li²,

Zhao³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

A self-learning adaptive system (SLAS) uses machine learning to enable and enhance its adaptability. Such systems are expected to perform well in dynamic situations. For learning high-performance adaptation policy, some assumptions must be made on the environment-system dynamics when information about the real situation is incomplete. However, these assumptions cannot be expected to be always correct, and yet it is difficult to enumerate all possible assumptions. This leads to the problem of incomplete-information learning. We consider this problem as multiple model problem in terms of finding the adaptation policy that can cope with multiple models of environment-system dynamics. This paper proposes a novel approach to engineering the online adaptation of SLAS. It separates three concerns that are related to the adaptation policy and presents the modeling and synthesis process, with the goal of achieving higher model construction efficiency. In addition, it designs a meta-reinforcement learning algorithm for learning the meta policy over the multiple models, so that the meta policy can quickly adapts to the real environment-system dynamics. At last, it reports the case study on a robotic system to evaluate the adaptability of the approach.

show abstract

“…ii) Monitoring based on inconsistencies during inference: These methods focus on detecting inconsistencies at runtime to avoid the robot making catastrophic decisions when deployed in a new environment. In Mallozzi et al (2019), they propose a method to enforce certain properties (including any safety-critical requirements) which they call invariants that the agent has to respect all the time while exploring complex partially observable environments using reinforcement learning. Their method, called WiseML, acts as a safety envelope over any existing reinforcement learning algorithms and prevents the agents from taking actions that violate the specified invariants.…”

Section: A Online Methodsmentioning

confidence: 99%

Learning for Robot Decision Making under Distribution Shift: A Survey

Paudel¹

2022

Preprint

View full text Add to dashboard Cite

With the recent advances in the field of deep learning, learning-based methods are widely being implemented in various robotic systems that help robots understand their environment and make informed decisions to achieve a wide variety of tasks or goals. However, learning-based methods have repeatedly been shown to have poor generalization when they are presented with inputs that are different from those during training leading to the problem of distribution shift. Any robotic system that employs learning-based methods is prone to distribution shift which might lead the agents to make decisions that lead to degraded performance or even catastrophic failure. In this paper, we discuss various techniques that have been proposed in the literature to aid or improve decision making under distribution shift for robotic systems. We present a taxonomy of existing literature and present a survey of existing approaches in the area based on this taxonomy. Finally, we also identify a few open problems in the area that could serve as future directions for research.

show abstract

A Runtime Monitoring Framework to Enforce Invariants on Reinforcement Learning Agents Exploring Complex Environments

Cited by 10 publications

References 19 publications

Towards Runtime Monitoring of Complex System Requirements for Autonomous Driving Functions

Towards Runtime Monitoring of Complex System Requirements for Autonomous Driving Functions

A Meta Reinforcement Learning-based Approach for Self-Adaptive System

Learning for Robot Decision Making under Distribution Shift: A Survey

Contact Info

Product

Resources

About