Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond. However, its uncertainty, referred to as dynamic and randomness, from the mobile device, wireless channel, and edge network sides, results in high-dimensional, nonconvex, nonlinear, and NP-hard optimization problems. Thanks to the evolved reinforcement learning (RL), upon iteratively interacting with the dynamic and random environment, its trained agent can intelligently obtain the optimal policy in MEC. Furthermore, its evolved versions, such as deep RL (DRL), can achieve higher convergence speed efficiency and learning accuracy based on the parametric approximation for the large-scale state-action space. This paper provides a comprehensive research review on RL-enabled MEC and offers insight for development in this area. More importantly, associated with free mobility, dynamic channels, and distributed services, the MEC challenges that can be solved by different kinds of RL algorithms are identified, followed by how they can be solved by RL solutions in diverse mobile applications. Finally, the open challenges are discussed to provide helpful guidance for future research in RL training and learning MEC. Index Terms-Mobile edge computing (MEC), network uncertainty, reinforcement learning (RL). emote Areas irplane taverse vers Diverse Scenarios MEC Deployment Future Applications Smart City Smart Home Hazard/Remote Areas Smart Agriculture Digital Twin Holographic Communication AR/VR/XR Metaverse Metaverse Metavers Autonomous Driving Small Data Center Switch BS Traffic Light Satellite Ship Airplane S ll S it h BS T ffi Li ht S t llit Shi Ai l 5G Networks and Beyond 5G Networks and Beyond Smart Factory performance. Its typical algorithms are K-means clustering, principal component analysis, and independent component analysis, which can be used in small cell clustering, heterogeneous network clustering, smart grid user classification, etc. However, in unreliable radio network environments, the classification accuracy decreases, which readily causes slow and inaccurate actions in MEC systems. • Different from the static solutions provided by supervised learning and unsupervised learning, RL gives a constantly evolving intelligent framework [21]-[27]. In RL [29], an agent is enabled to make proper decisions based on frequent interactions with the stochastic and dynamic environment. Based on the Markov models, the RL operates in a feedback mechanism (a closed loop) without the knowledge of input data, where based on the previous and current states, the agent can execute its actions to maximize the reward function. Additionally, the above behaviors, including the states, actions, and rewards, accumulate to generate experiences. The classic RL algorithm is Q-learning, which suffers from the curse of dimensionality caused by the large dimensions of the state-action spaces. Accordingly, upon leveraging the low-dimensional representation for the high-dimensional state-ac...