Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey

Liu, Yongshuai; Halev, Avishai; Liu, Xin

doi:10.24963/ijcai.2021/614

Cited by 57 publications

(26 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Safety in reinforcement learning is a challenging topic formally raised by Garcıa and Fernández [2015]. Readers can refer to the survey [Liu et al, 2021] for recent advances in safe RL. In this section, we only summarize the most related studies to our algorithm.…”

Section: Related Workmentioning

confidence: 99%

“…The most similar work to our proposed algorithm is Interior-point Policy Optimization (IPO) [Liu et al, 2020] which uses log-barrier functions as penalty terms to restrict policies into the feasible set. However, the interior-point method requires a feasible policy upon initialization which is not necessarily fulfilled and needs a further recovery.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Zhang¹,

Shen²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the proposed method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-ofthe-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Zhang¹,

Shen²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To formulate the learningbased design of a policy with a constraint, a constrained MDP (CMDP) [16] is appropriate. Many constrained DRL (CDRL) algorithms are proposed using the CMDP formulation [17].…”

Section: Imentioning

confidence: 99%

“…In this study, we assume that the system model is unknown. Therefore, we design an optimal policy under the STL constraint using a CDRL algorithm [17]. Then, we define the following functions.…”

Section: A Stl Constrained Problem and A 𝜏-Cmdpmentioning

confidence: 99%

Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation

Ikemoto¹,

Ushio²

2022

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) has attracted much attention as an approach to solve sequential decision making problems without mathematical models of systems or environments. In general, a constraint may be imposed on the decision making. In this study, we consider the optimal decision making problems with constraints to complete temporal highlevel tasks in the continuous state-action domain. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within a bounded time interval. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a 𝜏-CMDP. We formulate the STL constrained optimal decision making problem as the 𝜏-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.

show abstract

“…Although much progress has been made in RL, while the work on constrained RL is limited [42], [43]. The most common approach is to use Lagrangian relaxation [44], [45].…”

Section: Constrained Reinforcement Learningmentioning

confidence: 99%

CLARA: A Constrained Reinforcement Learning Based Resource Allocation Framework for Network Slicing

Liu¹,

Ding²,

Zhang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

As mobile networks proliferate, we are experiencing a strong diversification of services, which requires greater flexibility from the existing network. Network slicing is proposed as a promising solution for resource utilization in 5G and future networks to address this dire need. In network slicing, dynamic resource orchestration and network slice management are crucial for maximizing resource utilization. Unfortunately, this process is too complex for traditional approaches to be effective due to a lack of accurate models and dynamic hidden structures. We formulate the problem as a Constrained Markov Decision Process (CMDP) without knowing models and hidden structures. Additionally, we propose to solve the problem using CLARA, a Constrained reinforcement LeArning based Resource Allocation algorithm. In particular, we analyze cumulative and instantaneous constraints using adaptive interior-point policy optimization and projection layer, respectively. Evaluations show that CLARA clearly outperforms baselines in resource allocation with service demand guarantees.

show abstract

Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey

Cited by 57 publications

References 19 publications

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation

CLARA: A Constrained Reinforcement Learning Based Resource Allocation Framework for Network Slicing

Contact Info

Product

Resources

About