Linear programming versions of some control problems on Ma.rkov chains are derived, and are studied under conditions which occur iii typical rroblems which arise by discretizing continuous time and state_ systems, or in discrete state -ystems. Control interpretations of the dual variables and simplex multipliers are given. The formulation allows the treatment of 'state space' like constraints which cannot be handled conveniently with dynamic programming. The relation between dyneaiuc programming on Markov chains, and the deterministic discrete maxirlum princip]e in explored, and some insight is c stained into the problem of singular s46l chastic controls (with respect to a stochastic maximum principle). 1
IntroductionThis paper is concerned with several problems occuring in the control of a Markov chain (Xn } on the state space (01p ... ,N) -S , , ,9 with transition probabilities pij (a), where a, a control, takes values in a set Ui.State 0 is a desired target state and Poo (a) = 1; once in state 0, ulways in state 0. The terms u = (ul)...,uN), U t U i , denotes a control vector. I.e., if the control vector u is always used, and X = i, then the value of a in p ii (a) is u(Y.n ) = ui . Let T denote the first time state 0 is attained, k(i,cx) the cost paid when the state is i and control u(X ) = u = u is used, and E the expectation operator Define problem (Pl): Let U i contain a finite Number of points (which, for convenience, we assume are al,...,aq), or let the n +1 dimensional set ( pi 1 (U i )' "'' piN (U i ) p k(i, U i ) ) be a convex polyhedron with extreme points included in ((pil(ar)'"''piN(ar),k(ifar)), r -1,...,q). As-ume (Al): In Section 2, a linear programming formulation of (Pl) will be given. Linear programming (L.P.) versions of many types of dynamic programming problems are well known ( see, e.g.., [3] -[51, [9]). Indeed, a L. P. version of (1 11) was given by Derman [6]. The variables in the L.P. form in [6] do not seem to have a simple physical interpretation. However, the form here seems more natural and has a more natural dual, namely the dynamic programming equations for (P1) N Vi s E pij (ar )Vj + k(i,ar ), all i,r.
j =1While experience indicates that the linear programming algorithm (Simplex method) is generally inferior, in computational efficiency, to the available dynamic programming iterative methods (for the type of problems discussed !:ere), it is of interest since it is an alternative formulation which sheds further light on the Markov optimization problem and, in addition, the two important reasons:3 (a) There may be additional constraints on the probabilities P( Xn = i) ( Section 2) . The dynamic progrtuwdng is not d :reetly applicable, and the L.P. formulation yields useful insights into the optimization problem. Indeed, it i:-often desirable or necessary to add such constraints in Markov control problems. See Section 2 for example.(b) The L.P. formulation gives us insight into a form of a stochastic maximum principle (Section 3), and the singularity problem of the stochast...