In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε(≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.