In this paper, we study the dynamic time-slot scheme server selection problem in mobile edge computing (MEC) system, which takes into account the dynamic access of users at edge servers and introduces the dynamically changing factors affecting the workload of edge servers, such as offloading policies, offloading ratios, user’s transmitting power, and reserved capacity of the servers. However, the above factors make it tricky to achieve
long-term optimization in the edge server selection process. To cope with the above challenges, we model the server selection problem faced as a Markov Decision Process (MDP) and propose a Deep Reinforcement Learning (DRL) based algorithm to solve it. Selection strategy learning is performed based on the observed server selection performance in previous phases. Simulation results show that the DRL-based algorithm proposed in this paper has the lowest average latency compared to benchmark algorithms.