“…Given the so-called value function, one carries out the first policy iteration (FPI) step, which typically yields the greatest improvement towards the optimal policy. In the context of dispatching problems, this approach has been utilized to minimize the blocking probability, see Krishnan [15,16] and Leeuwaarden et al [17], and the sojourn time (i.e., delay or latency) or its generalization by arbitrary holding costs, see, e.g., Krishnan [18], Sassen et al [19], Bhulai et al [20] and Hyytiä et al [21][22][23]. Most dispatching systems considered have a rather complex state space (e.g., infinite number of waiting places, a continuous range of remaining service time, etc.)…”