1994
DOI: 10.1137/1138077
|View full text |Cite
|
Sign up to set email alerts
|

Control of Random Sequences in Problems with Constraints

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

1996
1996
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 7 publications
0
2
0
Order By: Relevance
“…We consider an infinite-horizon discounted constrained Markov decision process [79,6,4] -CMDP ( S, A, P, r, u, b, γ, ρ ) -where S, A are the action/action spaces, P is the transition kernel that specifies the transition probability P (s ′ | s, a) from state s to next state s ′ under action a ∈ A, r, u : S × A → [0, 1] are the reward/utility functions, b is the constraint threshold, γ ∈ [0, 1) is the discount factor, and ρ is the initial state distribution. A stationary stochastic policy π : S → ∆(A) determines a probability distribution ∆(A) over the action space A based on current state, i.e., a t ∼ π(• | s t ) at time t. Let Π be the set of all possible stochastic policies.…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…We consider an infinite-horizon discounted constrained Markov decision process [79,6,4] -CMDP ( S, A, P, r, u, b, γ, ρ ) -where S, A are the action/action spaces, P is the transition kernel that specifies the transition probability P (s ′ | s, a) from state s to next state s ′ under action a ∈ A, r, u : S × A → [0, 1] are the reward/utility functions, b is the constraint threshold, γ ∈ [0, 1) is the discount factor, and ρ is the initial state distribution. A stationary stochastic policy π : S → ∆(A) determines a probability distribution ∆(A) over the action space A based on current state, i.e., a t ∼ π(• | s t ) at time t. Let Π be the set of all possible stochastic policies.…”
Section: Preliminariesmentioning
confidence: 99%
“…Constrained Markov decision process (constrained MDP) is the classical formulation for constrained dynamic systems in the early stochastic control literature (e.g., [14,83,79,43,6]) and the recent constrained reinforcement learning (RL) literature (e.g., [18,2,4,91,41,72]). It is applicable to many constrained control problems by integrating other system specifications in constraints, and admits a natural extension of constrained optimization and Lagrangian over policies.…”
Section: Introductionmentioning
confidence: 99%