A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Wei, Honghao; Liu, Xin; Lü, Ying

doi:10.1609/aaai.v36i4.20302

Cited by 19 publications

(27 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We consider the following ergodicity assumption in the rest of the paper, which is commonly made in the RL literature [Wang, 2017, Wei et al, 2020, Wu et al, 2020.…”

Section: Connected Superlevel Set Under Tabular Policymentioning

confidence: 99%

Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems

Zeng¹,

Doan²,

Romberg³

2023

Preprint

View full text Add to dashboard Cite

The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the optimization objective as a function of the policy parameter and reward satisfies a stronger "equiconnectedness" property. To our best knowledge, these are novel and previously unknown discoveries.We present an application of the connectedness of these superlevel sets to the derivation of minimax theorems for robust reinforcement learning. We show that any minimax optimization program which is convex on one side and is equiconnected on the other side observes the minimax equality (i.e. has a Nash equilibrium). We find that this exact structure is exhibited by an interesting robust reinforcement learning problem under an adversarial reward attack, and the validity of its minimax equality immediately follows. This is the first time such a result is established in the literature.

show abstract

“…We consider the following ergodicity assumption in the rest of the paper, which is commonly made in the RL literature [Wang, 2017, Wei et al, 2020, Wu et al, 2020.…”

Section: Connected Superlevel Set Under Tabular Policymentioning

confidence: 99%

Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems

Zeng¹,

Doan²,

Romberg³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…A related problem is infinite-horizon non-episodic RL with provable guarantees (see Wei et al (2020Wei et al ( , 2019; Dong et al (2019) and the references within) as this problem is also motivated by not using resets. In this setting, there is only one episode that goes on indefinitely.…”

Section: Related Workmentioning

confidence: 99%

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Nguyen¹,

Cheng²

2023

Preprint

View full text Add to dashboard Cite

Real-world reinforcement learning (RL) is often severely limited since typical RL algorithms heavily rely on the reset mechanism to sample proper initial states. In practice, the reset mechanism is expensive to implement due to the need for human intervention or heavily engineered environments. To make learning more practical, we propose a generic no-regret reduction to systematically design reset-free RL algorithms. Our reduction turns reset-free RL into a two-player game. We show that achieving sublinear regret in this two player game would imply learning a policy that has both sublinear performance regret and sublinear total number of resets in the original RL problem. This means that the agent eventually learns to perform optimally and avoid resets. By this reduction, we design an instantiation for linear Markov decision processes, which is the first provably correct reset-free RL algorithm to our knowledge.

show abstract

“…T log T q hard violation when the objective is strongly-convex [21]. Safe Reinforcement Learning: Safe reinforcement learning (RL) refers to reinforcement learning with safety constraints and has received great interest as well [5,17,19,26,46,11,43,16,15,14,29,4,44,9,20,47]. In safe RL, The agent optimizes the policy by interacting with the environment without violating safety constraints.…”

Section: Coca-softmentioning

confidence: 99%

Online Nonstochastic Control with Adversarial and Static Constraints

Liu¹,

Yang²,

Lü³

2023

Preprint

View full text Add to dashboard Cite

This paper studies online nonstochastic control problems with adversarial and static constraints. We propose online nonstochastic control algorithms that achieve both sublinear regret and sublinear adversarial constraint violation while keeping static constraint violation minimal against the optimal constrained linear control policy in hindsight. To establish the results, we introduce an online convex optimization with memory framework under adversarial and static constraints, which serves as a subroutine for the constrained online nonstochastic control algorithms. This subroutine also achieves the state-of-the-art regret and constraint violation bounds for constrained online convex optimization problems, which is of independent interest. Our experiments demonstrate the proposed control algorithms are adaptive to adversarial constraints and achieve smaller cumulative costs and violations. Moreover, our algorithms are less conservative and achieve significantly smaller cumulative costs than the state-of-the-art algorithm.

show abstract

A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Cited by 19 publications

References 18 publications

Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems

Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Online Nonstochastic Control with Adversarial and Static Constraints

Contact Info

Product

Resources

About