2019
DOI: 10.48550/arxiv.1911.09101
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Safe Policies for Reinforcement Learning via Primal-Dual Methods

Abstract: In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. We therefore consider a constrained MDP where the constraints are probabilistic. Since there is no straightforward way to op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
28
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(28 citation statements)
references
References 32 publications
0
28
0
Order By: Relevance
“…CMDP. The study of RL algorithms for CMDPs has received considerable attention due to the safety requirement (Altman, 1999;Paternain et al, 2019;Yu et al, 2019;Dulac-Arnold et al, 2019;Garcıa & Fernández, 2015). Our work is closely related to Lagrangian-based CMDP algorithms with optimistic policy evaluations (Efroni et al, 2020;Singh et al, 2020;Ding et al, 2021;Liu et al, 2021;Qiu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…CMDP. The study of RL algorithms for CMDPs has received considerable attention due to the safety requirement (Altman, 1999;Paternain et al, 2019;Yu et al, 2019;Dulac-Arnold et al, 2019;Garcıa & Fernández, 2015). Our work is closely related to Lagrangian-based CMDP algorithms with optimistic policy evaluations (Efroni et al, 2020;Singh et al, 2020;Ding et al, 2021;Liu et al, 2021;Qiu et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Constrained RL: Several policy-gradient algorithms have seen success in practice [29,27,24,19,1,32]. Also of interest are works which utilize Gaussian processes to model the transition probabilities and value functions [5,30,18,7].…”
Section: Related Workmentioning
confidence: 99%
“…Several policy-gradient-based algorithms have been proposed to solve CMDPs. Lagrangian-based methods [29,27,24,19] formulate the CMDP problem as a saddle-point problem and optimize it via primal-dual methods, while Constrained Policy Optimization [1,32] (inspired by the trust region policy optimization [26]) computes new dual variables from scratch at each update to maintain constraints during learning. Although these algorithms provide ways to learn an optimal policy, performance guarantees about reward regret, safety violation or sample complexity are rare.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, the actor-critic approach is complicated and, therefore, it is difficult to characterize the convergence rate. Though a primaldual method with a good policy approximation is shown to converge to a neighborhood of the global optimum, the resulting policy may not even satisfy the constraint [25], [27]. A notable exception is the finite CMDP where the primal-dual methods can ensure the converge to an optimal policy [20], [28] and a sublinear convergence rate has been evaluated in [28].…”
Section: Introductionmentioning
confidence: 99%