In this article, the network throughput optimization problem is investigated based on the theory of Markov decision process, combining the device-to-device direct selection problem with the finite stage discount MDP model problem. First, models for the device-to-device communication selection using MDP are built; second, the optimal mode selection strategy is derived using a finite stage backward iterative algorithm; and finally, the given mode selection strategy is evaluated by conducting a large number of simulation experiments. The results show that the MDP-based mode selection method proposed in this article has better performance in maximizing throughput and can yield better mode selection strategies with the advantage of obtaining larger system throughput.