2021
DOI: 10.48550/arxiv.2109.06332
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Achieving Zero Constraint Violation for Concave Utility Constrained Reinforcement Learning via Primal-Dual Approach

Abstract: Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. The problem is mathematically formulated as constrained Markov decision process (CMDP). In the literature, various algorithms are available to solve CMDP problems in a model-free manner to achieve -optimal cumulative reward with feasible policies. An -feasible … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 9 publications
0
14
0
Order By: Relevance
“…Model-free RL algorithms have also been proposed [8][9][10] to solve CMDP. However, all of the above require a generator model, which simulates from any state and action.…”
Section: Can We Achieve Provably Sample-efficient and Model-free Expl...mentioning
confidence: 99%
See 1 more Smart Citation
“…Model-free RL algorithms have also been proposed [8][9][10] to solve CMDP. However, all of the above require a generator model, which simulates from any state and action.…”
Section: Can We Achieve Provably Sample-efficient and Model-free Expl...mentioning
confidence: 99%
“…These issues are exacerbated in the large state space. Hence, there are several recent works starting to investigate model-free algorithms for CMDPs, which directly update the value function or the policy without first estimating the model [8][9][10]. However, all of these works consider an easier setting compared to standard RL in that they assume access to a simulator [11] (a.k.a.…”
Section: Introductionmentioning
confidence: 99%
“…All rights reserved. 1 The detailed proof can be found in Appendix in (Bai et al 2021) 2012), etc.). The standard MDP equipped with the cost function for the constraints is called constrained Markov Decision process (CMDP) framework (Altman 1999).…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, [36] considered the Nash equilibrium computation for general-sum stochastic Nash games. [37] designed a primal-dual algorithm for constrained markov decision process (equivalent to a saddle point problem) and utilized regret analysis to prove zero constraint violation. In a different line of research, [38] considered the case when a network of cooperative agents need to solve a zero-sum game and proposed a distributed Bregman-divergence algorithm to compute a NE.…”
Section: Related Workmentioning
confidence: 99%