Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers' linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner's viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout's structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., "to your right") when the partner's viewpoint was misaligned by a small offset or coincided with the layout's structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones) to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in non-spatial tasks.