1986
DOI: 10.1017/s0001867800015792
|View full text |Cite
|
Sign up to set email alerts
|

Time-average optimal constrained semi-Markov decision processes

Abstract: Optimal causal policies maximizing the time-average reward over a semi-Markov decision process (SMDP), subject to a hard constraint on a time-average cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP. Under an accessibility hypothesis, several notions of time ave… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
31
0
1

Year Published

1994
1994
2018
2018

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(33 citation statements)
references
References 7 publications
1
31
0
1
Order By: Relevance
“…The following results are obtained through a straightforward adaptation to the discounted case of the proofs of Lemma 3.1, Theorem 4.3 and Theorem 4.4 from Beutler and Ross (1985) and Corollary 3.5 from Beutler and Ross (1986):…”
Section: The Problem Udp [θ] Just Defined Is Thus a Lagrangian Relaxamentioning
confidence: 99%
“…The following results are obtained through a straightforward adaptation to the discounted case of the proofs of Lemma 3.1, Theorem 4.3 and Theorem 4.4 from Beutler and Ross (1985) and Corollary 3.5 from Beutler and Ross (1986):…”
Section: The Problem Udp [θ] Just Defined Is Thus a Lagrangian Relaxamentioning
confidence: 99%
“…The Lagrange multiplier formulation relating the constrained optimization to an unconstrained optimization [25], [26] is used in this paper to deal with the handoff dropping constraint. To fit into this formulation, we need to include the history information in our state descriptor.…”
Section: Constraintsmentioning
confidence: 99%
“…To deal with the fairness constraint, we use the Lagrange multiplier framework studied in Beutler and Ross (1986). Since the fairness constraint is a past-dependent constraint (the vector R(s n+1 ) depends on the rejection ratios over the past history), to fit into this framework, we need to include this history information into our state descriptor.…”
Section: Fairness Constraintmentioning
confidence: 99%
“…If there exists a non-randomized policy π ω * that solves the Bellman optimality equation associated with reward function (29), and in the mean time, achieves the equality in (27), then Beutler and Ross (1986) shows that π ω * is the constrained optimal policy.…”
Section: Fairness Constraintmentioning
confidence: 99%