A linear function approximation based reinforcement learning algorithm is proposed for Markov decision processes with infinite horizon risk-sensitive cost. Its convergence is proved using the 'o.d.e. method' for stochastic approximation. The scheme is also extended to continuous state space processes. 1. Introduction. Recent decades have seen a major activity in approximate dynamic programming for Markov decision processes based on real or simulated data, using reinforcement learning algorithms. (See, e.g., Bertsekas and Tsitsiklis (1996) [10] and Sutton and Barto (1998) [30] for book length treatments and Si et al (2004) [28] for a flavour of more recent activity. While most of this work has focused on the additive cost criteria such as discounted or time-averaged cost, relatively little has been done for the multiplicative cost (or risk-sensitive cost as it is better known). There is, however, a lot of interest in this cost criterion, as it has important applications in finance, e.g., Bagchi and Sureshkumar (2002) (1983) [5] were developed. These were 'raw' in the sense that there was no explicit approximation of the value function in order to beat down the curse of dimensionality. In case of additive costs, there is a considerable body of work on such approximation architectures, one of the most popular being the linear function approximation. Here one seeks an approximation of the value function in terms of linear combination of a moderate number of basis functions specified a priori. The learning scheme then iteratively learns the weights (or coefficients) of this linear combination instead of learning the full value function, which is a much higher dimensional object. The first rigorous analysis of such a scheme is in Tsitsiklis and Van Roy (1997) [31], where its convergence was proved for the problem of policy evaluation. Since then there have been several variations of the basic theme, see, e.g., Bertsekas, Borkar and Nedic (2004) [8] and the references therein. The aim of this article is to propose a similar linear function approximation based learning scheme for policy evaluation in risk-sensitive control and justify it rigorously.
We study zero-sum risk-sensitive stochastic differential games on the infinite horizon with discounted and ergodic payoff criteria. Under certain assumptions, we establish the existence of values and saddle-point equilibria. We obtain our results by studying the corresponding Hamilton–Jacobi–Isaacs equations. Finally, we show that the value of the ergodic payoff criterion is a constant multiple of the maximal eigenvalue of the generators of the associated nonlinear semigroups.
The infinite horizon risk-sensitive discounted-cost and ergodic-cost nonzero-sum stochastic games for controlled Markov chains with countably many states are analyzed. For the discounted-cost game, we prove the existence of Nash equilibrium strategies in the class of Markov strategies under fairly general conditions. Under an additional geometric ergodicity condition and a small cost criterion, the existence of Nash equilibrium strategies in the class of stationary Markov strategies is proved for the ergodic-cost game.
In this paper we consider a zero-sum Markov stopping game on a general state space with impulse strategies and infinite time horizon discounted payoff where the state dynamics is a weak Feller-Markov process. One of the key contributions is our analysis of this problem based on "shifted" strategies, thereby proving that the original game can be practically restricted to a sequence of Dynkin's stopping games without affecting the optimalty of the saddle-point equilibria and hence completely solving some open problems in the existing literature. Under two quite general (weak) assumptions, we show the existence of the value of the game and the form of saddle-point (optimal) equilibria in the set of shifted strategies. Moreover, our methodology is different from the previous techniques used in the existing literature and is based on purely probabilistic arguments. In the process, we establish an interesting property of the underlying Feller-Markov process and the impossibility of infinite number of impulses in finite time under saddle-point strategies which is crucial for the verification result of the corresponding Isaacs-Bellman equations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.