2022
DOI: 10.1007/s10994-022-06187-8
|View full text |Cite
|
Sign up to set email alerts
|

Safety-constrained reinforcement learning with a distributional safety critic

Abstract: Safety is critical to broadening the real-world use of reinforcement learning. Modeling the safety aspects using a safety-cost signal separate from the reward and bounding the expected safety-cost is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, it can be risky to set constraints only on the expectation neglecting the tail of the distribution, which might have prohibitively large values. In this paper, we propose a method called Worst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(26 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…Our approach adopts the policy reuse strategy that directly leverages a guide policy to sample trajectories, which facilitates rapid adaptation to a new task (Rosman et al, 2016;Fernández and Veloso, 2006). This strategy leads to better initial trajectories and improves the jump-start by providing a strong initial point for the learning algorithm Yang et al, 2022). The guide policy can take the form of a rule-based policy, expert policy, or well-trained policy (Ayeelyan et al, 2022).…”
Section: Policy Reusementioning
confidence: 99%
“…Our approach adopts the policy reuse strategy that directly leverages a guide policy to sample trajectories, which facilitates rapid adaptation to a new task (Rosman et al, 2016;Fernández and Veloso, 2006). This strategy leads to better initial trajectories and improves the jump-start by providing a strong initial point for the learning algorithm Yang et al, 2022). The guide policy can take the form of a rule-based policy, expert policy, or well-trained policy (Ayeelyan et al, 2022).…”
Section: Policy Reusementioning
confidence: 99%
“…These approaches mainly differ in terms of the way they parameterize the return distribution and the distance metric that is used to measure the difference between two distributions. In this work, we follow the authors of [Yang et al 2023] and utilize the implicit quantile network (IQN) [Dabney et al 2018a] to approximate the cost return distribution.…”
Section: Distributional Reinforcement Learningmentioning
confidence: 99%
“…Worst-case soft actor-critic (WCSAC) [Yang et al 2023] is a soft actor-critic (SAC) [Haarnoja et al 2018a, Haarnoja et al 2018b] based algorithm that uses a distributional safety critic to produce risk-averse behavior. To this end, the upper tail of the estimated distribution is used.…”
Section: Risk-averse Safe Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Apart from supervised and self-supervised learning, approaches such as meta-learning (Kirsch, van Steenkiste, and Schmidhuber 2020;Guo, Wu, and Lee 2022), transfer learning (Guo et al 2019;Vrbančič and Podgorelec 2020) and curriculum learning (Bengio et al 2009;Park and Park 2022;Hu et al 2022) have also demonstrated their capacity to adapt pre-trained policies to novel environments through retraining (Packer et al 2019). Integration of techniques such safe RL is recommended within the training process to allow the agent to avoid hazardous states, thereby ensuring consistent normal performance and enhancing training efficiency (Yang et al 2021(Yang et al , 2022. When confronted with expanded state space, the latter category of solutions that retrain learned policies is often favored.…”
Section: Introductionmentioning
confidence: 99%