2012
DOI: 10.1007/s10957-012-9989-5
|View full text |Cite
|
Sign up to set email alerts
|

An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Abstract: We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
66
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 66 publications
(66 citation statements)
references
References 16 publications
0
66
0
Order By: Relevance
“…Proof The proof follows in a similar manner as that of Theorem 3 in Bhatnagar and Lakshmanan (2012).…”
Section: Convergence Analysis Of the Average Reward Risk-sensitive Acmentioning
confidence: 89%
“…Proof The proof follows in a similar manner as that of Theorem 3 in Bhatnagar and Lakshmanan (2012).…”
Section: Convergence Analysis Of the Average Reward Risk-sensitive Acmentioning
confidence: 89%
“…While most methods for solving constrained MDPs revolve around the use of mathematical programs, some reinforcement learning approaches have also been proposed for optimizing the average-reward objective and, to a lesser extent, for solving constrained instances of average-reward MDPs. Some noteworthy examples include the constrained actor-critic method proposed by Bhatnagar and Lakshmanan (2012), wherein a Lagrangian relaxation of the problem is used to incorporate steady-state costs into the objective function being optimized by the constrained actor-critic algorithm. A similar Lagrangian Q-learning approach is proposed by Lakshmanan and Bhatnagar (2012).…”
Section: Related Workmentioning
confidence: 99%
“…Other Settings. We also want to mention some related literature in game theory [39,40,41,42,43], two-time-scale stochastic approximation [44,45,46,47,48,49,50,51,52,53], reinforcement learning [54,55,56,57,58], two-time-scale optimization [59,60], and decentralized optimization [61,62,63,64,65,66,67]. These works study different variants of two-time-scale methods mostly for solving a single optimization problem, and often aim to find global optimality (or fixed points) using different structure of the underlying problems (e.g., Markov structure in stochastic games and reinforcement learning or strong monotonicity in stochastic approximation).…”
Section: Related Workmentioning
confidence: 99%