People often imagine everyday objects are something else. A turned over bottle becomes a car, a teapot becomes a swan. Such pretense is common in play, pedagogy, and narratives. The relationship between a real and pretend object is flexible, but not arbitrary. In this work, we used a novel behavioral and computational approach to study the features that guide the construction of visual pretense. In four studies (N = 720 in total), we show that people have systematic preferences in visual pretense, and that these preferences are better accounted for by the spatial and physical alignment (specifically shape similarity), over surface feature similarity (such as color). We also found that people systematically align the subpart structure of real and pretend objects. Throughout our studies, we compare human performance to current multi-modal vision models, and find that they do not account for people's behavior, likely due to their reliance on surface features rather than spatial and physical ones.