This paper presents sufficient conditions for the existence of stationary optimal policies for average cost Markov decision processes with Borel state and action sets and weakly continuous transition probabilities. The one-step cost functions may be unbounded, and the action sets may be noncompact. The main contributions of this paper are: (i) general sufficient conditions for the existence of stationary discount optimal and average cost optimal policies and descriptions of properties of value functions and sets of optimal actions, (ii) a sufficient condition for the average cost optimality of a stationary policy in the form of optimality inequalities, and (iii) approximations of average cost optimal actions by discount optimal actions.Key words: Markov decision process; average cost per unit time; optimality inequality; optimal policy MSC2000 subject classification: Primary: 90C40; secondary: 90C39 OR/MS subject classification: Primary: dynamic programming/optimal control; secondary: Markov, infinite state 1. Introduction. This paper provides sufficient conditions for the existence of stationary optimal policies for average cost Markov decision processes (MDPs) with Borel state and action sets and weakly continuous transition probabilities. The cost functions may be unbounded, and the action sets may be noncompact. The main contributions of this paper are: (i) general sufficient conditions for the existence of stationary discount optimal and average cost optimal policies and descriptions of properties of value functions and sets of optimal actions (Theorems 1, 3, and 4), (ii) a new sufficient condition for average cost optimality based on optimality inequalities (Theorem 2), and (iii) approximations of average cost optimal actions by discount optimal actions (Theorem 5).For infinite-horizon MDPs, there are two major criteria: average costs per unit time and expected total discounted costs. The former is typically more difficult to analyze. The so-called vanishing discount factor approach is often used to approximate average costs per unit time by normalized expected total discounted costs. The literature on average cost MDPs is vast. Most of the earlier results are surveyed in Arapostathis et al. [1]. Here, we mention just a few references.For finite-state and action sets, Derman [10] proved the existence of stationary average cost optimal policies. This result follows from Blackwell [6] and it also was independently proved by Viskov and Shiryaev [31]. When either the state set or action set is infinite, even -optimal policies may not exist for some > 0; Ross [25], Dynkin and Yushkevich [11, Chapter 7], Feinberg [12, §5]. For a finite-state set and compact action sets, optimal policies may not exist; Bather [2], Chitashvili [9], and Dynkin and Yushkevich [11, Chapter 7].For MDPs with finite-state and action sets, there exist stationary policies satisfying optimality equations (see Dynkin and Yushkevich [11, Chapter 7], where these equations are called canonical), and, furthermore, any stationary policy satisfy...
This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized strategies that change actions only at jump epochs. This holds both for one-criterion problems and for multiple-objective problems with constraints. In particular, this paper introduces the theory for multiple-objective problems with expected total discounted rewards and constraints. If a problem is feasible, there exist three types of optimal policies: (i) nonrandomized switching stationary policies, (ii) randomized stationary policies for the CTJMDP, and (iii) randomized stationary policies for the corresponding SMDP with exponentially distributed sojourn times, and these policies can be implemented as randomized strategies in the CTJMDP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.