In a classic Markov decision problem of Derman, Lieberman, and Ross (1975) an investor has an initial capital x from which to make investments, the opportunities for which occur randomly over time. An investment of size y results in profit P (y), and the aim is maximize the sum of the profits obtained within a given time t. The problem is similar to a groundwater management problem of Burt (1964), the notorious bomber problem of Klinger and Brown (1968), and types of fighter problems addressed by Weber (1985), Shepp et al (1991) and Bartroff et al (2010a). In all these problems, one is allocating successive portions of a limited resource, optimally allocating y(x, t), as a function of remaining resource x and remaining time t. For their investment problem, Derman et al (1975) proved that an optimal policy has three monotonicity properties: (A) y(x, t) is nonincreasing in t, (B) y(x, t) is nondecreasing in x, and (C) x − y(x, t) is nondecreasing in x. Theirs is the only problem of its type for which all three properties are known to be true.In the bomber problem the status of (B) is unresolved. For the general fighter problem the status of (A) is unresolved. We survey what is known about these exceedingly difficult problems. We show that (A) and (C) remain true in the bomber problem, but that (B) is false if we very slightly relax the assumptions of the usual model. We give other new results, counterexamples and conjectures for these problems.Keywords Bomber problem · fighter problem · groundwater management problem · investment problem · Markov decision problem 1 Stochastic sequential allocation problems
An investment problemIn a classic Markov decision problem posed by Derman, Lieberman, and Ross (1975) an investor has initial capital of x dollars, from which he can make withdrawals to fund investments. The opportunities for investment occur randomly as time proceeds. An investment of size y results in profit P (y), and the aim is maximize the sum of the profits obtained within a given time t. Without more ado let us set out this problem's dynamic programming equation. With an eye to variations of this problem to come, we make some notational changes. We take the remaining number of uninvested dollars to be discrete (a nonnegative integer) and denote it by n (= 0, 1, . . .) rather than x. The remaining time is also discrete and denoted by t = 0, 1, . . . . We suppose that an investment opportunity occurs with probability p (= 1 − q), independently at each remaining time step. We replace P (y) with a(k). Let F (n, t) denote the expected total payoff from investments made over t Richard Weber Statistical Laboratory, University of Cambridge, Cambridge CB2 0WB, UK E-mail: rrw1@cam.ac.uk 2 Richard Weber further time steps, given initial capital n, and following an optimal policy. The dynamic programming equations is then F (n, t) = qF (n, t − 1) + p max k∈{0,1,...,n} a(k) + F (n − k, t − 1) ,with F (n, 0) = 0. Let k(n, t) denote a k which is maximizing on the right hand side of (1). We might have chosen to ...