“…Learning to Navigate in 3D Environments: Realistic 3D environments and simulation platforms [1,11,84,76] egocentric visual observations [104,103,66,50,12,14] for performing question answering [43,26,78,25], instructions following [4,19,3], and active visual tracking [60,94,95]. Previous studies have shown audio as a strong cue for obstacle avoidance and navigation [77,44,33,70,64,32,83]. However, these methods are either non-photorealistic, not supporting AI agents, or rendering audio not geometrically and acoustically correct.…”