Mobile edge computing (MEC) has recently emerged as a promising technology to release the tension between computation-intensive applications and resource-limited mobile terminals (MTs). In this paper, we study the delay-optimal computation offloading in computation-constrained MEC systems. We consider the computation task queue at the MEC server due to its constrained computation capability.In this case, the task queue at the MT and that at the MEC server are strongly coupled in a cascade manner, which creates complex interdependencies and brings new technical challenges. We model the computation offloading problem as an infinite horizon average cost Markov decision process (MDP), and approximate it to a virtual continuous time system (VCTS) with reflections. Different to most of the existing works, we develop the dynamic instantaneous rate estimation for deriving the closedform approximate priority functions in different scenarios. Based on the approximate priority functions, we propose a closed-form multi-level water-filling computation offloading solution to characterize the influence of not only the local queue state information (LQSI) but also the remote queue state information (RQSI). A extension is provided from single MT single MEC server scenarios to multiple MTs multiple MEC servers scenarios and several insights are derived. Finally, the simulation results show that the proposed scheme outperforms the conventional schemes.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.Technology, Hong Kong. 2 I. INTRODUCTION Smart mobile terminals (MTs) with advanced communication and computation capabilities facilitate us with a pervasive and powerful platform to realize many emerging computationintensive mobile applications, e.g., interactive gaming, character recognition, and natural language processing [2], [3]. These pose exigent requirements on the quality of computation experience, especially for the delay-sensitive applications. Computation offloading [4], which offloads the computation tasks to the offloading destination, is one of the fundamental services to improve the computation performance, i.e., delay performance. In computation offloading services, both the communication capability of the MT and the computation capability of the offloading destination will influence the delay performance. Specifically, • The communication capability of the MT: The offloading rate varies according to the time-varying wireless channel quality between the MT and the offloading destination. Poor communication capabilities will result in the starvation of the computation of the offloading destination, which induces a large queuing delay at the MT.• The computation capability of the offloading destination: In practical scenarios, the offloaded tasks cannot be executed immediately because the computation capability of the offloading destination is not infinity. Both the computation time and the waiting time at ...