“…[11], [17], and [23]), the bias and the overtaking optimality criteria (that choose an average optimal policy with the maximal expected reward growth as the time horizon goes to ∞; see, e.g. [7], [8], [10, p. 132], [12], [16], and [19,Chapter 10]), and the so-called discountsensitive criteria (which choose policies that are asymptotically optimal as the discount rate converges to 0; see [7], [13], [15], [19,Chapter 10], and [22]), among others.…”