Abstract. We investigate the problem of minimizing a certainty equivalent of the total or discounted cost over a finite and an infinite horizon which is generated by a Markov Decision Process (MDP). The certainty equivalent is defined by U −1 (E U (Y )) where U is an increasing function. In contrast to a risk-neutral decision maker this optimization criterion takes the variability of the cost into account. It contains as a special case the classical risk-sensitive optimization criterion with an exponential utility. We show that this optimization problem can be solved by an ordinary MDP with extended state space and give conditions under which an optimal policy exists. In the case of an infinite time horizon we show that the minimal discounted cost can be obtained by value iteration and can be characterized as the unique solution of a fixed point equation using a 'sandwich' argument. Interestingly, it turns out that in case of a power utility, the problem simplifies and is of similar complexity than the exponential utility case, however has not been treated in the literature so far. We also establish the validity (and convergence) of the policy improvement method. A simple numerical example, namely the classical repeated casino game is considered to illustrate the influence of the certainty equivalent and its parameters. Finally also the average cost problem is investigated. Surprisingly it turns out that under suitable recurrence conditions on the MDP for convex power utility U , the minimal average cost does not depend on U and is equal to the risk neutral average cost. This is in contrast to the classical risk sensitive criterion with exponential utility.
This paper is devoted to studying the average optimality in continuous-time Markov decision processes with fairly general state and action spaces. The criterion to be maximized is expected average rewards. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first provide two optimality inequalities with opposed directions, and also give suitable conditions under which the existence of solutions to the two optimality inequalities is ensured. Then, from the two optimality inequalities we prove the existence of optimal (deterministic) stationary policies by using the Dynkin formula. Moreover, we present a "semimartingale characterization" of an optimal stationary policy. Finally, we use a generalized Potlach process with control to illustrate the difference between our conditions and those in the previous literature, and then further apply our results to average optimal control problems of generalized birth-death systems, upwardly skip-free processes and two queueing systems. The approach developed in this paper is slightly different from the "optimality inequality approach" widely used in the previous literature. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Applied Probability, 2006, Vol. 16, No. 2, 730-756. This reprint differs from the original in pagination and typographic detail. 1 2 X. GUO AND U. RIEDER specified by four primitive data: a state space S; an action space A with subsets A(x) of admissible actions, which may depend on the current state x ∈ S; transition rates q(·|x, a); and reward (or cost) rates r(x, a). Using these terms, we now briefly describe some existing works on the expected average criterion. When the state space is finite, a bounded solution to the average optimality equation (AOE) and methods for computing optimal stationary policies have been investigated in [23,26,30]. Since then, most work has focused on the case of a denumerable state space; for instance, see [6,24] for bounded transition and reward rates, [18,27,31,34,39,41] for bounded transition rates but unbounded reward rates, [16,35] for unbounded transition rates but bounded reward rates and [12,13,17] for unbounded transition and reward rates. For the case of an arbitrary state space, to the best of our knowledge, only Doshi [5] and Hernández-Lerma [19] have addressed this issue. They ensured the existence of optimal stationary policies. However, the treatments in [5] and [19] are restricted to uniformly bounded reward rates and nonnegative cost rates, respectively, and the AOE plays a key role in the proof of the existence of average optimal policies. Moreover, to establish the AOE, Doshi [5] needed the hypothesis that all admissible action sets are finite and the relative difference of the optimal discounted value function is equicontinuous, whereas in [19] the assumption about the existence of a solution to the AOE is imposed. O...
Abstract-A financial market with one bond and one stock is considered where the risk free interest rate, the appreciation rate of the stock and the volatility of the stock depend on an external finite state Markov chain. We investigate the problem of maximizing the expected utility from terminal wealth and solve it by stochastic control methods for different utility functions. Due to explicit solutions it is possible to compare the value function of the problem to one where we have constant (average) market data. The case of benchmark optimization is also considered.
We study portfolio optimization problems in which the drift rate of the stock is Markov modulated and the driving factors cannot be observed by the investor. Using results from filter theory, we reduce this problem to one with complete observation. In the cases of logarithmic and power utility, we solve the problem explicitly with the help of stochastic control methods. It turns out that the value function is a classical solution of the corresponding Hamilton-Jacobi-Bellman equation. As a special case, we investigate the so-called Bayesian case, i.e. where the drift rate is unknown but does not change over time. In this case, we prove a number of interesting properties of the optimal portfolio strategy. In particular, using the likelihood-ratio ordering, we can compare the optimal investment in the case of observable drift rate to that in the case of unobservable drift rate. Thus, we also obtain the sign of the drift risk.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.