In recorded, live, virtual-or augmented-reality immersive musical experiences, the audio presentation may not be prescribed by the geometrical and physical model of a room or acoustic enclosure (as may often be expected, for instance, in architectural acoustic design and in video games or virtual reality). In previous work, a generic spatial audio rendering engine and interface were proposed, compatible with both physically-and perceptually-based representations of interactive audio scenes navigable at playback time. This parametric model builds and extends upon concepts proposed previously in spatial audio programming or media standard specifications, including OpenAL and MPEG-4. It enables prioritizing auditory plausibility over simulation exactness, facilitating low-complexity implementation even for large multi-object audio scenes, by sharing a "multiroom" acoustic reverberation and reflections processor. In this paper, we review seminal research on subjective room acoustics and the design of the perceptually based spatialization interface realized in IRCAM Spat. The resulting "Spatial Audio Object Workstation" is consistent with familiar content creation tools and workflows, and enhances their functionality with fine control of the spatial motion (including distance and orientation), directionality, presence, and reverberance of each sound source. Finally, we propose a generic immersive audio coding format as a container of "6-DoF spatial audio objects" to advance a future interoperable audio content and experience ecosystem.