Human episodic memories are (re)constructed, combining unique features with schema-based predictions, and share neural substrates with imagination. They also show systematic schema-based distortions that increase with consolidation. Here we present a computational model in which hippocampal replay (from an autoassociative network) trains generative models (variational autoencoders) in neocortex to (re)create sensory experiences via latent variable representations. Simulations using large image datasets reflect the effects of memory age and hippocampal lesions and the slow learning of statistical structure in agreement with previous models (Complementary Learning Systems and Multiple Trace Theory), but also explain schema-based distortions, imagination, inference, and continual representation learning in memory. Critically, the model suggests how unique and predictable elements of memories are stored and reconstructed by efficiently combining both hippocampal and neocortical systems, optimising the use of limited hippocampal storage. Finally, the model can be extended to sequential stimuli, including language, and multiple neocortical networks could be trained, including those with latent variable representations in entorhinal, medial prefrontal, and anterolateral temporal cortices. Overall, we believe hippocampal replay training neocortical generative models provides a comprehensive account of memory construction and consolidation.