The entorhinal-hippocampal system plays a crucial role in spatial cognition and navigation. Since the discovery of grid cells in layer II of medial entorhinal cortex (MEC), several types of models have been proposed to explain their development and operation; namely, continuous attractor network models, oscillatory interference models, and self-organizing map (SOM) models. Recent experiments revealing the in vivo intracellular signatures of grid cells (Domnisoru et al., 2013; Schmidt-Heiber and Hausser, 2013), the primarily inhibitory recurrent connectivity of grid cells (Couey et al., 2013; Pastoll et al., 2013), and the topographic organization of grid cells within anatomically overlapping modules of multiple spatial scales along the dorsoventral axis of MEC (Stensola et al., 2012) provide strong constraints and challenges to existing grid cell models. This article provides a computational explanation for how MEC cells can emerge through learning with grid cell properties in modular structures. Within this SOM model, grid cells with different rates of temporal integration learn modular properties with different spatial scales. Model grid cells learn in response to inputs from multiple scales of directionally-selective stripe cells (Krupic et al., 2012; Mhatre et al., 2012) that perform path integration of the linear velocities that are experienced during navigation. Slower rates of grid cell temporal integration support learned associations with stripe cells of larger scales. The explanatory and predictive capabilities of the three types of grid cell models are comparatively analyzed in light of recent data to illustrate how the SOM model overcomes problems that other types of models have not yet handled.