We consider a general class of total cost Markov decision processes (MDP) in which the one-stage costs can have arbitrary signs, but the sum of the negative parts of the one-stage costs is finite for all policies and all initial states. This class, which we refer to as the general convergence (GC) total cost model, contains several important subclasses of problems, e.g., positive costs problems, bounded negative costs problems, and discounted problems with unbounded onestage costs. We study the convergence of value iteration for the (GC) model, in the Borel MDP framework with universally measurable policies. Our main results include (i) convergence of value iteration when starting from certain functions above the optimal cost function; (ii) convergence of transfinite value iteration starting from zero, as well as convergence of ordinary nontransfinite value iteration for finite control or certain finite state (GC) problems, in the special case where the optimal cost function is nonnegative; and (iii) partial convergence of value iteration starting from zero, for a subset of initial states. These results extend several previously known results about the convergence of value iteration for either positive costs problems or (GC) total cost problems.
Introduction.In this paper we study convergence properties of value iteration for a class of Markov decision processes (MDP) under the undiscounted total cost criterion. Specifically, we consider problems in which the one-stage costs can take both positive and negative values, but we assume that under any policy and for any starting state, the expected total sum, over the infinite horizon, of the negative parts of the one-stage cost is finite. Following the extensive survey on this class of MDP (Feinberg [15]), we shall refer to this total cost model as the general convergence total cost model (GC). It contains several classes of special models, in particular, the positive costs model (P), where all the one-stage costs are nonnegative, the negative costs model (N) where all the one-stage costs are nonpositive and the optimal costs are finite, as well as the bounded negative costs model [28, Chap. 7.2] and a subset of the discounted models with unbounded costs (UD).It is known that for the (GC) model, value iteration starting from the constant function zero need not converge to the optimal cost function, surprisingly, even for finite state and control problems (see van der Wal [37, Ex. 3.2], Feinberg [15, Ex. 6.10]). This is also true for the positive costs model (P) (although convergence is ensured for (P) if each state has only a finite number of feasible controls [2, Prop. 9.18] or more generally, if certain compactness conditions [1], [2, Prop. 9.17] or semicontinuity and compactness model assumptions [29] are satisfied). In the (P) case, Maitra and Sudderth established the convergence of transfinite value iteration [24], and Whittle formulated a sufficient condition, called the bridging condition, for the