Markov decision processes (MDPs) in queues and networks have been an interesting topic in many practical areas since the 1960s. This paper Provides a detailed overview on this topic and tracks the evolution of many basic results. Also, this paper summarizes several interesting directions in the future research. We hope that this overview can shed light to MDPs in queues and networks, and also to their extensive applications in various practical areas.One main purpose of this paper is to provide an overview for research on MDPs in queues and networks in the last six decades. Also, such a survey is first related to several other basic studies, such as, Markov processes, queueing systems, queueing networks, Markov decision processes, sensitivity-based optimization, stochastic optimization, fluid and diffusion control. Therefore, our analysis begins from three simple introductions:1 Markov processes and Markov decision processes, queues and queueing networks, and queueing dynamic control.
(a) Markov processes and Markov decision processesThe Markov processes, together with the Markov property, were first introduced by a Russian mathematician: Andrei Andreevich Markov in 1906. See Markov [238] for more details. From then on, as a basically mathematical tool, the Markov processes have extensively been discussed by many authors, e.g., see some excellent books by Doob [99], Karlin [175], Karlin and Taylor [176], Chung [80], Anderson [21], Kemeny et al. [181], Meyn and Tweedie [241], Chen [77], Ethier and Kurtz [110] and so on. In 1960, Howard [165] is the first to propose and discuss the MDP (or stochastic dynamic programming) in terms of his Ph.D thesis, which opened up a new and important field through an interesting intersection between Markov processes and dynamic programming (e.g., see Bellman and Kalaba [32]). From then on, not only are the MDPs an important branch in the area of Markov processes, but also it is a basic method in modern dynamic control theory. Crucially, the MDPs have been greatly motivated and widely applied in many practical areas in the past 60 years. Readers may refer to some excellent books, for example, the discrete-time MDPs by Puterman [261], Glasserman and Yao [143], Bertsekas [33], Bertsekas and Tsitsiklis [34], Hernádez-Lerma and Lasserre [155, 156], Altman [9], Koole [193] and Hu and Yue [166]; the continuous-time MDPs by Guo and Hernández-Lerma [145]; the partially observable MDPs by Cassandra [67] and Krishnamurthy [196]; the competitive MDPs (i.e., stochastic game) by [127]; the sensitivity-based optimization by Cao [58]; some applications of MDPs by Feinberg and Shwartz (Eds.) [122]and Boucherie and Van Dijk (Eds.) [44]; and so on.
(b) Queues and queueing networksIn the early 20th century, a Danmark mathematician: Agner Krarup Erlang, published a pioneering work [109] of queueing theory in 1909, which started the study of queueing theory and traffic engineering. Over the past 100 years, queueing theory has been regarded as a key mathematical tool not only for analyzing practical s...