Humans and other animals have a remarkable capacity to translate their position from one spatial frame of reference to another. The ability to seamlessly move between top-down and first-person views is important for navigation, memory formation, and other cognitive tasks. Evidence suggests that the medial temporal lobe and other cortical regions contribute to this function. To understand how a neural system might carry out these computations, we used variational autoencoders (VAEs) to reconstruct the first-person view from the top-down view of a robot simulation, and vice versa. Many latent variables in the VAEs had similar responses to those seen in neuron recordings, including location-specific activity, head direction tuning, and encoding of distance to local objects. Place-specific responses were prominent when reconstructing a first-person view from a top-down view, but head direction–specific responses were prominent when reconstructing a top-down view from a first-person view. In both cases, the model could recover from perturbations without retraining, but rather through remapping. These results could advance our understanding of how brain regions support viewpoint linkages and transformations.