Exploring Safer Behaviors for Deep Reinforcement Learning

Marchesini, Enrico; Corsi, Davide; Farinelli, Alessandro

doi:10.1609/aaai.v36i7.20737

Cited by 18 publications

(33 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Constrained reinforcement learning is an emerging field [13,14,12]. To show the effectiveness of our approach, we also compared it to an implementation of Lagrangian-PPO, as suggested by [15].…”

Section: Related Workmentioning

confidence: 99%

“…An emerging family of approaches for achieving these two goals, known as constrained DRL [12], attempts to simultaneously optimize two functions: the reward, which encodes the main objective of the task; and the cost, which represents the safety constraints. Current state-of-the-art algorithms include IPO [13], SOS [14], CPO [12], and Lagrangian approaches [15]. Despite their success in some applications, these methods generally suffer from significant setbacks: (i) there is no uniform and human-readable way of defining the required safety constraints; (ii) it is unclear how to encode these constraints as a signal for the training algorithm; and (iii) there is no clear method for balancing cost and reward during training, and thus there is a risk of producing sub-optimal policies.…”

Section: Introductionmentioning

confidence: 99%

“…While common DRL-training techniques were shown to give rise to high-performance policies for this task [20], these policies are often unsafe, inefficient, or unpredictable, thus dramatically limiting their potential deployment in real-world systems [21,14].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Constrained Reinforcement Learning for Robotics via Scenario-Based Programming

Corsi¹,

Yerushalmi²,

Amir³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) has achieved groundbreaking successes in a wide variety of robotic applications. A natural consequence is the adoption of this paradigm for safety-critical tasks, where human safety and expensive hardware can be involved. In this context, it is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior. This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop. Our technique exploits the scenario-based programming paradigm, which is designed to allow specifying such knowledge in a simple and intuitive way. We validated our method on the popular robotic mapless navigation problem, in simulation, and on the actual platform. Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Constrained Reinforcement Learning for Robotics via Scenario-Based Programming

Corsi¹,

Yerushalmi²,

Amir³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, Safe DRL problems are typically modeled using Constrained Markov Decision Processes (CMPDs) [9], where an agent aims at maximizing a reward signal while keeping cost values accumulated upon visiting unsafe states under a hardcoded threshold. However, the constraints imposed by these approaches hinder exploration, failing to learn safe behaviors in complex environments [10], [11]. Alternative ways have been investigated to overcome the difficulty of designing Safe DRL algorithms that combine the concept of risk in the optimization while avoiding unsafe situations [8], [12], [13].…”

Section: Introductionmentioning

confidence: 99%

“…To this end, [11], [15] proposed a sample-based approximation method to enumerate the number of states in the space x that violate a specific property. Such a value referred to as violation, has been used to induce safety information during the training.…”

Section: Introductionmentioning

confidence: 99%

Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation

Marzari¹,

Marchesini²,

Farinelli³

2023

Preprint

View full text Add to dashboard Cite

Safety is essential for deploying Deep Reinforcement Learning (DRL) algorithms in real-world scenarios. Recently, verification approaches have been proposed to allow quantifying the number of violations of a DRL policy over input-output relationships, called properties. However, such properties are hard-coded and require task-level knowledge, making their application intractable in challenging safetycritical tasks. To this end, we introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time. CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties. Hence, we propose a refinement strategy to combine properties that model similar unsafe interactions. Our evaluation compares the benefits of computing the number of violations using standard hard-coded properties and the ones generated with CROP. We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.

show abstract

Verifying Learning-Based Robotic Navigation Systems

Amir

Corsi

Yerushalmi

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) has become a dominant deep-learning paradigm for tasks where complex policies are learned within reactive systems. Unfortunately, these policies are known to be susceptible to bugs. Despite significant progress in DNN verification, there has been little work demonstrating the use of modern verification tools on real-world, DRL-controlled systems. In this case study, we attempt to begin bridging this gap, and focus on the important task of mapless robotic navigation — a classic robotics problem, in which a robot, usually controlled by a DRL agent, needs to efficiently and safely navigate through an unknown arena towards a target. We demonstrate how modern verification engines can be used for effective model selection, i.e., selecting the best available policy for the robot in question from a pool of candidate policies. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior, such as collisions and infinite loops. We also apply verification to identify models with overly conservative behavior, thus allowing users to choose superior policies, which might be better at finding shorter paths to a target. To validate our work, we conducted extensive experiments on an actual robot, and confirmed that the suboptimal policies detected by our method were indeed flawed. We also demonstrate the superiority of our verification-driven approach over state-of-the-art, gradient attacks. Our work is the first to establish the usefulness of DNN verification in identifying and filtering out suboptimal DRL policies in real-world robots, and we believe that the methods presented here are applicable to a wide range of systems that incorporate deep-learning-based agents.

show abstract

Exploring Safer Behaviors for Deep Reinforcement Learning

Cited by 18 publications

References 14 publications

Constrained Reinforcement Learning for Robotics via Scenario-Based Programming

Constrained Reinforcement Learning for Robotics via Scenario-Based Programming

Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation

Verifying Learning-Based Robotic Navigation Systems

Contact Info

Product

Resources

About