Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence 2021
DOI: 10.24963/ijcai.2021/614
|View full text |Cite
|
Sign up to set email alerts
|

Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey

Abstract: Reinforcement Learning (RL) algorithms have had tremendous success in simulated domains. These algorithms, however, often cannot be directly applied to physical systems, especially in cases where there are constraints to satisfy (e.g. to ensure safety or limit resource consumption). In standard RL, the agent is incentivized to explore any policy with the sole goal of maximizing reward; in the real world, however, ensuring satisfaction of certain constraints in the process is also necessary and essential. In th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 57 publications
(26 citation statements)
references
References 19 publications
0
26
0
Order By: Relevance
“…Safety in reinforcement learning is a challenging topic formally raised by Garcıa and Fernández [2015]. Readers can refer to the survey [Liu et al, 2021] for recent advances in safe RL. In this section, we only summarize the most related studies to our algorithm.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Safety in reinforcement learning is a challenging topic formally raised by Garcıa and Fernández [2015]. Readers can refer to the survey [Liu et al, 2021] for recent advances in safe RL. In this section, we only summarize the most related studies to our algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…The most similar work to our proposed algorithm is Interior-point Policy Optimization (IPO) [Liu et al, 2020] which uses log-barrier functions as penalty terms to restrict policies into the feasible set. However, the interior-point method requires a feasible policy upon initialization which is not necessarily fulfilled and needs a further recovery.…”
Section: Related Workmentioning
confidence: 99%
“…To formulate the learningbased design of a policy with a constraint, a constrained MDP (CMDP) [16] is appropriate. Many constrained DRL (CDRL) algorithms are proposed using the CMDP formulation [17].…”
Section: Imentioning
confidence: 99%
“…In this study, we assume that the system model is unknown. Therefore, we design an optimal policy under the STL constraint using a CDRL algorithm [17]. Then, we define the following functions.…”
Section: A Stl Constrained Problem and A 𝜏-Cmdpmentioning
confidence: 99%
“…Although much progress has been made in RL, while the work on constrained RL is limited [42], [43]. The most common approach is to use Lagrangian relaxation [44], [45].…”
Section: Constrained Reinforcement Learningmentioning
confidence: 99%