Understanding the location of acoustically relective surfaces in a room is a critical component in advanced sound processing. For example, intelligent speakers can use a room's acoustic geometry to improve playback quality, source separation accuracy, and speech recognition. In this paper, we present Synesthesia, a system for capturing the acoustic properties of a room using a single ixed speaker and a mobile phone that records audio at multiple locations. Using the arrival time of echoes, the system is able to reconstruct the position of relective surfaces like walls and then estimate properties like surface absorption. Previous work has shown how the acoustic room impulse response (RIR) of an environment can be used to analyze echoes within a space to reconstruct room geometry. The best current RIR-based approaches rely on high-end equipment and capturing an acoustic signal broadcast into space from a known ixed constellation of microphones. They also require the precise calibration and measurement of microphone positions. In addition, most approaches pose constraints on room geometries and limit the order of RIR to achieve accurate and consistent results. In this paper, we introduce a new approach that performs RIR imaging using a mobile phone that tracks its location with visual inertial odometry (VIO) to record a dense set of samples albeit with noise in their locations. We present a new approach that is able to relax several key assumptions on RIR and show through both experimentation and simulation that even with 20cm of uncertainty in the microphone locations provided by VIO, we are still able to reconstruct the room geometry with accurate shape and dimensions. We demonstrate this capability by prototyping a tool for acoustic engineers, that allows a user to view a room's estimated geometry and absorption overlaid on the actual sensed space with augmented reality. CCS CONCEPTS • Theory of computation → Nonconvex optimization; Unsupervised learning and clustering; • Human-centered computing → Mobile computing;