The spatial responses of many of the cells recorded in all layers of rodent medial entorhinal cortex (mEC) show a triangular grid pattern, which appears to provide an accurate population code for position, and once established might be based in part on path-integration mechanisms. Competing models, each partially contradicted by experimental observations, try to explain how the grid-like pattern emerges in terms of network interactions, or of interactions with theta oscillations or, the one we have proposed, of mere single-unit mechanisms.Grid axes are tightly aligned across simultaneously recorded units. Recent experimental findings have shown that grids can often be better described as elliptical rather than purely circular and that, beyond the mutual alignment of their grid axes, ellipses tend to also orient their long axis along preferred directions. Are grid alignment and ellipse orientation the same phenomenon? Does the grid alignment result from single-unit mechanisms or does it require network interactions?We address these issues by refining our model, to describe specifically the spontaneous emergence of conjunctive grid-by-head-direction cells in layers III, V and VI of mEC. We find that tight alignment can be produced by recurrent collateral interactions, but this requires head-direction modulation. Through a competitive learning process driven by spatial inputs, grid fields then form already aligned, and with randomly distributed spatial phases. In addition, we find that the selforganization process is influenced by the behavior of the simulated rat. The common grid alignment often orients along preferred running directions. The shape of individual grids is distorted towards an ellipsoid arrangement when some speed anisotropy is present in exploration behavior. Speed anisotropy on its own also tends to align grids, even without collaterals, but the alignment is seen to be loose. Finally, the alignment of spatial grid fields in multiple environments shows that the network expresses the same set of grid fields across environments, modulo a coherent rotation and translation. Thus, an efficient metric encoding of space may emerge through spontaneous pattern formation at the single-unit level, but it is coherent, hence context-invariant, if aided by collateral interactions.