Human perception, cognition, and action are laced with seemingly arbitrary mappings. In particular, sound has a strong spatial connotation: Sounds are high and low, melodies rise and fall, and pitch systematically biases perceived sound elevation. The origins of such mappings are unknown. Are they the result of physiological constraints, do they reflect natural environmental statistics, or are they truly arbitrary? We recorded natural sounds from the environment, analyzed the elevation-dependent filtering of the outer ear, and measured frequency-dependent biases in human sound localization. We find that auditory scene statistics reveals a clear mapping between frequency and elevation. Perhaps more interestingly, this natural statistical mapping is tightly mirrored in both ear-filtering properties and in perceived sound location. This suggests that both sound localization behavior and ear anatomy are fine-tuned to the statistics of natural auditory scenes, likely providing the basis for the spatial connotation of human hearing.frequency-elevation mapping | head-related transfer function | Bayesian modeling | cross-modal correspondence T he spatial connotation of auditory pitch is a universal hallmark of human cognition. High pitch is consistently mapped to high positions in space in a wide range of cognitive (1-3), perceptual (4-6), attentional (7-12), and linguistic functions (13), and the same mapping has been consistently found in infants as young as 4 mo of age (14). In spatial hearing, the perceived spatial elevation of pure tones is almost fully determined by frequency--rather than physical location--in a very systematic fashion [i.e., the Pratt effect (4, 5)]. Likewise, most natural languages use the same spatial attributes, high and low, to describe pitch (13), and throughout the history of musical notation high notes have been represented high on the staff. However, a comprehensive account for the origins of the spatial connotation of auditory pitch to date is still missing. More than a century ago, Stumpf (13) suggested that it might stem from the statistics of natural auditory scenes, but this hypothesis has never been tested. This is a major omission, as the frequency-elevation mapping often leads to remarkable inaccuracies in sound localization (4, 5) and can even trigger visual illusions (6), but it can also lead to benefits such as reduced reaction times or improved detection performance (7-12). ResultsTo trace the origins of the mapping between auditory frequency and perceived vertical elevation, we first measured whether this mapping is already present in the statistics of natural auditory signals. When trying to characterize the statistical properties of incoming signals, it is critical to distinguish between distal stimuli, the signals as they are generated in the environment, and proximal stimuli, the signals that reach the transducers (i.e., the middle and inner ear). In the case of auditory stimuli this is especially important, because the head and the outer ear operate as frequency-and elev...
The association between auditory pitch and spatial elevation is one the most fascinating examples of cross-dimensional mappings: in a wide range of cognitive, perceptual, attentional and linguistic tasks, humans consistently display a positive, sometimes absolute, association between auditory pitch and spatial elevation. However, the origins of such a pervasive mapping are still largely unknown.Through a combined analysis of environmental sounds and anthropometric measures, we demonstrate that, statistically speaking, this mapping is already present in both the distal and the proximal stimulus. Specifically, in the environment, high sounds are more likely to come from above; moreover, due to the filtering properties of the external ear, sounds coming from higher elevations have more energy at high frequencies.Next, we investigated whether the internalized mapping depends on the statistics of the proximal, or of the distal stimulus. In a psychophysical task, participants had to localize narrow band-pass noises with different central frequencies, while head- and world-centred reference frames were put into conflict by tilting participants’ body orientation. The frequency of the sounds systematically biased localization in both head- and world-centred coordinates, and, remarkably, in agreement with the mappings measured in both the distal and proximal stimulus.These results clearly demonstrate that the cognitive mapping between pitch and elevation mirror the statistical properties of the auditory signals. We argue that, in a shorter time-scale, humans learn the statistical properties auditory signals; while, in a longer timescale, the evolution of the acoustic properties of the external ear itself is shaped by the statistics of the acoustic environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.