2022
DOI: 10.1609/aaai.v36i7.20737
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Safer Behaviors for Deep Reinforcement Learning

Abstract: We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing a cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on the cost, limiting the exploration and leading to a significant trade-off between cost and reward. In contrast, we propose a Safety-Oriented Search that complements Deep RL algorithms to bias the policy toward safety within an evolutionary cost optimization. We lever… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
32
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(33 citation statements)
references
References 14 publications
1
32
0
Order By: Relevance
“…Constrained reinforcement learning is an emerging field [13,14,12]. To show the effectiveness of our approach, we also compared it to an implementation of Lagrangian-PPO, as suggested by [15].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Constrained reinforcement learning is an emerging field [13,14,12]. To show the effectiveness of our approach, we also compared it to an implementation of Lagrangian-PPO, as suggested by [15].…”
Section: Related Workmentioning
confidence: 99%
“…An emerging family of approaches for achieving these two goals, known as constrained DRL [12], attempts to simultaneously optimize two functions: the reward, which encodes the main objective of the task; and the cost, which represents the safety constraints. Current state-of-the-art algorithms include IPO [13], SOS [14], CPO [12], and Lagrangian approaches [15]. Despite their success in some applications, these methods generally suffer from significant setbacks: (i) there is no uniform and human-readable way of defining the required safety constraints; (ii) it is unclear how to encode these constraints as a signal for the training algorithm; and (iii) there is no clear method for balancing cost and reward during training, and thus there is a risk of producing sub-optimal policies.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In particular, Safe DRL problems are typically modeled using Constrained Markov Decision Processes (CMPDs) [9], where an agent aims at maximizing a reward signal while keeping cost values accumulated upon visiting unsafe states under a hardcoded threshold. However, the constraints imposed by these approaches hinder exploration, failing to learn safe behaviors in complex environments [10], [11]. Alternative ways have been investigated to overcome the difficulty of designing Safe DRL algorithms that combine the concept of risk in the optimization while avoiding unsafe situations [8], [12], [13].…”
Section: Introductionmentioning
confidence: 99%
“…To this end, [11], [15] proposed a sample-based approximation method to enumerate the number of states in the space x that violate a specific property. Such a value referred to as violation, has been used to induce safety information during the training.…”
Section: Introductionmentioning
confidence: 99%