The modality by which object azimuths (directions) are presented affects learning of multiple locations. In Experiment 1, participants learned sets of three and five object azimuths specified by a visual virtual environment, spatial audition (3D sound), or auditory spatial language. Five azimuths were learned faster when specified by spatial modalities (vision, audition) than by language. Experiment 2 equated the modalities for proprioceptive cues and eliminated spatial cues unique to vision (optic flow) and audition (differential binaural signals). There remained a learning disadvantage for spatial language. We attribute this result to the cost of indirect processing from words to spatial representations.Information about spatial layout can be conveyed by sensory cues, as from vision or spatial audition, or abstractly by spatial language (e.g., "1 o'clock, 6 feet"). Although learning and memory for visually specified positions or object locations has been investigated (e.g., Musen 1966;Pezdek et al. 1986;Naveh-Benjamin 1987;Tresch et al. 1993;Postma and De Haan 1996;Chieffi and Allport 1997), little research compares learning and memory performance in 3D space across modalities (e.g., Battacchi et al. 1981). Recently, Loomis et al. (2002) demonstrated that spoken language could produce a spatial representation that functioned behaviorally like one derived from 3D sound, despite the fact that the neural pathways to spatial representation are quite different across these input modalities. They showed that when a single location was specified and listeners walked to it directly or indirectly, without vision, their degree of convergence along the direct and indirect paths was comparable for the two modalities. This result indicates a functionally equivalent representation but does not indicate its code, which might be supramodal or modality-specific, for example, a visuo-spatial image activated by all modalities. However, Loomis et al. (2002) found that congenitally blind participants performed updating equivalently with 3D sound and spatial language, which supports an argument against visual recoding.A functionally equivalent representation of location across modalities does not guarantee comparable processing demands for encoding. We asked here whether multiple azimuths could be learned from auditory spatial language as readily as from directly spatial modalities-audition and vision. (Distance was not varied because of distortions in auditory distance perception; e.g., Loomis et al. 1998.) Participants were presented with a set of target objects, each at a specific azimuth, and then were probed in sequence with the object names and asked to indicate the corresponding azimuths, until a learning criterion was reached. To minimize effects caused by intermodal differences in temporal and spatial resolution or access to verbal codes (Potter and Faulconer 1975;Pezdek et al. 1986), the stimuli were sequentially presented verbal object labels, the presentation times were equated across modalities and allowed ample time f...