2022
DOI: 10.1609/aaai.v36i4.20302
|View full text |Cite
|
Sign up to set email alerts
|

A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Abstract: This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning horizon K, which is sufficiently large, the proposed algorithm achieves sublinear regret and zero constraint violation. The bounds depend on the number of states S, the number of actions A, and two constants which are independent of the learning horizon K.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 19 publications
(27 citation statements)
references
References 18 publications
0
21
0
Order By: Relevance
“…We consider the following ergodicity assumption in the rest of the paper, which is commonly made in the RL literature [Wang, 2017, Wei et al, 2020, Wu et al, 2020.…”
Section: Connected Superlevel Set Under Tabular Policymentioning
confidence: 99%
“…We consider the following ergodicity assumption in the rest of the paper, which is commonly made in the RL literature [Wang, 2017, Wei et al, 2020, Wu et al, 2020.…”
Section: Connected Superlevel Set Under Tabular Policymentioning
confidence: 99%
“…A related problem is infinite-horizon non-episodic RL with provable guarantees (see Wei et al (2020Wei et al ( , 2019; Dong et al (2019) and the references within) as this problem is also motivated by not using resets. In this setting, there is only one episode that goes on indefinitely.…”
Section: Related Workmentioning
confidence: 99%
“…T log T q hard violation when the objective is strongly-convex [21]. Safe Reinforcement Learning: Safe reinforcement learning (RL) refers to reinforcement learning with safety constraints and has received great interest as well [5,17,19,26,46,11,43,16,15,14,29,4,44,9,20,47]. In safe RL, The agent optimizes the policy by interacting with the environment without violating safety constraints.…”
Section: Coca-softmentioning
confidence: 99%