Synthesis of Minimum-Cost Shields for Multi-agent Systems

Bharadwaj, Suda; Bloem, Roderick; Dimitrova, Rayna; Könighofer, Bettina; Topcu, Ufuk

doi:10.23919/acc.2019.8815233

Cited by 19 publications

(16 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shields are usually constructed offline by computing a maximally permissive policy containing all actions that will not violate the safety specification. Several extensions exist [4,6,29,39]. The shielding approach has been shown to be successful in combination with RL [2,21].…”

Section: Related Workmentioning

confidence: 99%

Online Shielding for Stochastic Systems

Könighofer

Rudolf

Palmisano

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Online Shielding for Stochastic Systems

Könighofer

Rudolf

Palmisano

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The concept of guaranteeing safety at runtime is referred to as shielding [23]. Recent works employ shields in reinforcement learning [1], for Markov decision processes (MDP) [20], or for multi-agent systems [3], yet, none involve partially observable systems.…”

Section: B Related Workmentioning

confidence: 99%

Safe Policies for Factored Partially Observable Stochastic Games

Carr¹,

Jansen²,

Bharadwaj³

et al. 2021

Robotics: Science and Systems XVII

Self Cite

View full text Add to dashboard Cite

We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expected value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that-under certain assumptions on the observability and the reward structure-separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent's behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Any reward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach's feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problem under energy constraints that is motivated by a paired helicopter with NASA's Perseverance Mars rover.

show abstract

“…An existing approach called shielding [4,12] uses reactive synthesis and assumes that the shield has full knowledge and control of the whole system-in this case the entire UAM system and the vehicles it handles. A technique for synthesizing quantitative shields for multi-agent systems in a fully centralized manner was presented in [1]. However, all these approaches are only applicable if a feasible solution exists.…”

Section: Related Workmentioning

confidence: 99%

Minimum-Violation Traffic Management for Urban Air Mobility

Bharadwaj

Wongpiromsarn

Neogi

et al. 2021

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Urban air mobility (UAM) refers to air transportation services in and over an urban area and has the potential to revolutionize mobility solutions. However, due to the projected scale of operations, current air traffic management (ATM) techniques are not viable. Increasingly autonomous systems are a pathway to accelerate the realization of UAM operations, but must be fielded safely and efficiently. The heavily regulated, safety critical nature of aviation may lead to multiple, competing safety constraints that can be traded off based on the operational context. In this paper, we design a framework which allows for the scalable planning of a UAM ATM system. We formalize safety oriented constraints derived from FAA regulations by encoding them as temporal logic formulae. We then propose a method for UAM ATM that is both scalable and minimally violates the temporal logic constraints. Numerical results show that the runtime for our proposed algorithm is suitable for very large problems and is backed by theoretical guarantees of correctness with respect to given temporal logic constraints.

show abstract

Synthesis of Minimum-Cost Shields for Multi-agent Systems

Cited by 19 publications

References 20 publications

Online Shielding for Stochastic Systems

Online Shielding for Stochastic Systems

Safe Policies for Factored Partially Observable Stochastic Games

Minimum-Violation Traffic Management for Urban Air Mobility

Contact Info

Product

Resources

About