A linear function approximation based reinforcement learning algorithm is proposed for Markov decision processes with infinite horizon risk-sensitive cost. Its convergence is proved using the 'o.d.e. method' for stochastic approximation. The scheme is also extended to continuous state space processes. 1. Introduction. Recent decades have seen a major activity in approximate dynamic programming for Markov decision processes based on real or simulated data, using reinforcement learning algorithms. (See, e.g., Bertsekas and Tsitsiklis (1996) [10] and Sutton and Barto (1998) [30] for book length treatments and Si et al (2004) [28] for a flavour of more recent activity. While most of this work has focused on the additive cost criteria such as discounted or time-averaged cost, relatively little has been done for the multiplicative cost (or risk-sensitive cost as it is better known). There is, however, a lot of interest in this cost criterion, as it has important applications in finance, e.g., Bagchi and Sureshkumar (2002) (1983) [5] were developed. These were 'raw' in the sense that there was no explicit approximation of the value function in order to beat down the curse of dimensionality. In case of additive costs, there is a considerable body of work on such approximation architectures, one of the most popular being the linear function approximation. Here one seeks an approximation of the value function in terms of linear combination of a moderate number of basis functions specified a priori. The learning scheme then iteratively learns the weights (or coefficients) of this linear combination instead of learning the full value function, which is a much higher dimensional object. The first rigorous analysis of such a scheme is in Tsitsiklis and Van Roy (1997) [31], where its convergence was proved for the problem of policy evaluation. Since then there have been several variations of the basic theme, see, e.g., Bertsekas, Borkar and Nedic (2004) [8] and the references therein. The aim of this article is to propose a similar linear function approximation based learning scheme for policy evaluation in risk-sensitive control and justify it rigorously.