2018
DOI: 10.1145/3310090
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic Policy Reuse for Safe Reinforcement Learning

Abstract: This work introduces Policy Reuse for Safe Reinforcement Learning (PR-SRL), an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and action reinforcement learning problems in which the dynamic behavior is reasonably smooth, and the space is Euclidean. The algorithm uses a continuously increasing monotonic risk function which allows for the identification of the probability to end up in failure from a given state. Such a risk function is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 28 publications
(52 reference statements)
0
6
0
Order By: Relevance
“…Safe reinforcement learning is an active field of research for which an extensive overview is given by Garcia et al [13]. Some approaches consider the setting in which safety must be learned through environmental interactions, which means safety constraints may be violated during training [7,25].…”
Section: Main Contributions and Related Workmentioning
confidence: 99%
“…Safe reinforcement learning is an active field of research for which an extensive overview is given by Garcia et al [13]. Some approaches consider the setting in which safety must be learned through environmental interactions, which means safety constraints may be violated during training [7,25].…”
Section: Main Contributions and Related Workmentioning
confidence: 99%
“…Shown in Fig. 1b, states are labeled as known or unknown, depending on the agent having visited the state previously [24]. Finally, combining these concepts, the definition of a Safe State Space (SSS) and a Fatal State Space (FSS) can be given [25][26][27].…”
Section: Safe Learningmentioning
confidence: 99%
“…This issue of distributional shift is a well-studied problem in the literature (Fujimoto, Meger, and Precup 2019;Kumar et al 2019Kumar et al , 2020. Several solutions have been proposed, such as reverting to a safe policy (Richter and Roy 2017), forcefully resetting the agent (Ainsworth, Barnes, and Srinivasa 2019), or requesting human intervention (Laskey et al 2016;García and Fernández 2019). In our problem definition, we assume that a safe policy is not known outside of the expert trajectories provided, that resetting the agent is not possible and that human supervision in the true environment is very costly and therefore undesirable.…”
Section: Related Workmentioning
confidence: 99%