1975
DOI: 10.1111/j.1467-9574.1975.tb00238.x
|View full text |Cite
|
Sign up to set email alerts
|

Discounted semi‐Markov decision processes: linear programming and policy iteration

Abstract: For semi‐Markov decision processes with discounted rewards we derive the well known results regarding the structure of optimal strategies (nonrandomized, stationary Markov strategies) and the standard algorithms (linear programming, policy iteration). Our analysis is completely based on a primal linear programming formulation of the problem.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
1

Year Published

1976
1976
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 3 publications
0
6
0
1
Order By: Relevance
“…These models can be treated by linear programming, too, as was shown by Fox [1966], Denardo/ Fox [1968], Osaki/Mine [1968], Denardo [1970], Hinomoto [1971] (who started from a management control problem) and Wessels/van Nunen [1975] (whose analysis is completely based on a primal linear programming formulation). Mine/Tabata [1970] at first transformed a continuous parameter model into a discrete one and then applied a linear programming method.…”
Section: Semi-markovian Decision Models and Further Related Topicsmentioning
confidence: 97%
“…These models can be treated by linear programming, too, as was shown by Fox [1966], Denardo/ Fox [1968], Osaki/Mine [1968], Denardo [1970], Hinomoto [1971] (who started from a management control problem) and Wessels/van Nunen [1975] (whose analysis is completely based on a primal linear programming formulation). Mine/Tabata [1970] at first transformed a continuous parameter model into a discrete one and then applied a linear programming method.…”
Section: Semi-markovian Decision Models and Further Related Topicsmentioning
confidence: 97%
“…As known, [BlackweU, 1962], [Wessels and van Nunen, 1975], we can restrict ourselves to nonrandomized stationary policies, which will be denoted by f6K := {K(1) x K(2) x ... x K(N)}. The components us(i) of the N x 1 vector u s give the total expected discounted reward if the initial state is i and policy f is used.…”
Section: Preliminariesmentioning
confidence: 99%
“…Hence for this new Markov decision process attention may be restricted to memoryless strategies (e.g. [6J,[IOJ,[12J), which implies the same for the original problem. This new Markov decision process is defined in the following way: S, the new set of states, consists of So and two representations of S: S* = {s* I s E S} and S* = {s* I s € S}.…”
Section: Proof Dmentioning
confidence: 99%