To guide navigation, the nervous system integrates multisensory self-motion and landmark information. We examined how these inputs generate the representation of self-location by recording entorhinal grid, border and speed cells in mice navigating virtual environments. Manipulating the gain between the animal’s locomotion and the visual scene revealed that border cells responded to landmark cues while grid and speed cells responded to combinations of locomotion, optic flow, and landmark cues in a context-dependent manner, with optic flow becoming more influential when it was faster than expected. A network model explained these results, providing principled regimes under which grid cells remain coherent with or break away from the landmark reference frame. Moreover, during path integration-based navigation, mice estimated their position following the principles predicted by our recordings. Together, these results provide a quantitative framework for understanding how landmark and self-motion cues combine during navigation to generate spatial representations and guide behavior.