This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing -optimal policies. Finally, we apply our main results to a business system.
Keywordssemi-Markov decision processes, loss rate, risk probability, first passage time, optimal policy, iteration algorithm
MSC(2010) 90C40, 60J27Citation: Huang X X, Zou X L, Guo X P. A minimization problem of the risk probability in first passage semiMarkov decision processes with loss rates.
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.
This paper deals with the risk probability for finite horizon semi-Markov decision processes with loss rates. The criterion to be minimized is the risk probability that the total loss incurred during a finite horizon exceed a loss level. For such an optimality problem, we first establish the optimality equation, and prove that the optimal value function is a unique solution to the optimality equation. We then show the existence of an optimal policy, and develop a value iteration algorithm for computing the value function and optimal policies. We also derive the approximation of the value function and the rules of iteration. Finally, a numerical example is given to illustrate our results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.