The hippocampus and ventromedial prefrontal cortex (vmPFC) play key roles in numerous cognitive domains including mind-wandering, episodic memory, and imagining the future. Perspectives differ on precisely how they support these diverse functions, but there is general agreement that it involves constructing representations composed of numerous elements. Visual scenes have been deployed extensively in cognitive neuroscience because they are paradigmatic multielement stimuli. However, it remains unclear whether scenes, rather than other types of multifeature stimuli, preferentially engage hippocampus and vmPFC. Here, we leveraged the high temporal resolution of magnetoencephalography to test participants as they gradually built scene imagery from three successive auditorily presented object descriptions and an imagined 3-D space. This was contrasted with constructing mental images of nonscene arrays that were composed of three objects and an imagined 2-D space. The scene and array stimuli were, therefore, highly matched, and this paradigm permitted a closer examination of step-by-step mental construction than has been undertaken previously. We observed modulation of theta power in our two regions of interest—anterior hippocampus during the initial stage and vmPFC during the first two stages, of scene relative to array construction. Moreover, the scene-specific anterior hippocampal activity during the first construction stage was driven by the vmPFC, with mutual entrainment between the two brain regions thereafter. These findings suggest that hippocampal and vmPFC neural activity is especially tuned to scene representations during the earliest stage of their formation, with implications for theories of how these brain areas enable cognitive functions such as episodic memory.