Humans and other animals effortlessly identify natural sounds and categorize them into behaviorally relevant categories. Yet, the acoustic features and neural transformations that enable sound recognition and the formation of perceptual categories are largely unknown. Here, using multichannel neural recordings in the auditory midbrain of unanesthetized female rabbits, we first demonstrate that neural ensemble activity in the auditory midbrain displays highly structured correlations that vary with distinct natural sound stimuli. These stimulus-driven correlations can be used to accurately identify individual sounds using single-response trials, even when the sounds do not differ in their spectral content. Combining neural recordings and an auditory model, we then show how correlations between frequency-organized auditory channels can contribute to discrimination of not just individual sounds but sound categories. For both the model and neural data, spectral and temporal correlations achieved similar categorization performance and appear to contribute equally. Moreover, both the neural and model classifiers achieve their best task performance when they accumulate evidence over a time frame of approximately 1–2 seconds, mirroring human perceptual trends. These results together suggest that time-frequency correlations in sounds may be reflected in the correlations between auditory midbrain ensembles and that these correlations may play an important role in the identification and categorization of natural sounds.
Humans and other animals effortlessly identify sounds and categorize them into behaviorally relevant categories. Yet, the acoustic features and neural transformations that enable the formation of perceptual categories are largely unknown. Here we demonstrate that correlation statistics between frequency-organized cochlear sound channels are reflected in the neural ensemble activity of the auditory midbrain and that such activity, in turn, can contribute to discrimination of perceptual categories. Using multi-channel neural recordings in the auditory midbrain of unanesthetized rabbits, we first demonstrate that neuron ensemble correlations are highly structured in both time and frequency and can be decoded to distinguish sounds. Next, we develop a probabilistic framework for measuring the nonstationary spectro-temporal correlation statistics between frequency organized channels in an auditory model. In a 13-category sound identification task, classification accuracy is consistently high (>80%), improving with sound duration and plateauing at ~ 1-3 seconds, mirroring human performance trends. Nonstationary short-term correlation statistics are more informative about the sound category than the time-average correlation statistics (84% vs. 73% accuracy). When tested independently, the spectral and temporal correlations between the model outputs achieved a similar level of performance and appear to contribute equally. These results outline a plausible neural code in which correlation statistics between neuron ensembles of different frequencies can be read-out to identify and distinguish acoustic categories.
The perception of sound textures, a class of natural sounds defined by statistical sound structure such as fire, wind, and rain, has been proposed to arise through the integration of time-averaged summary statistics. Where and how the auditory system might encode these summary statistics to create internal representations of these stationary sounds, however, is unknown. Here, using natural textures and synthetic variants with reduced statistics, we show that summary statistics modulate the correlations between frequency organized neuron ensembles in the awake rabbit inferior colliculus (IC). These neural ensemble correlation statistics capture high-order sound structure and allow for accurate neural decoding in a single trial recognition task with evidence accumulation times approaching 1 s. In contrast, the average activity across the neural ensemble (neural spectrum) provides a fast (tens of milliseconds) and salient signal that contributes primarily to texture discrimination. Intriguingly, perceptual studies in human listeners reveal analogous trends: the sound spectrum is integrated quickly and serves as a salient discrimination cue while high-order sound statistics are integrated slowly and contribute substantially more toward recognition. The findings suggest statistical sound cues such as the sound spectrum and correlation structure are represented by distinct response statistics in auditory midbrain ensembles, and that these neural response statistics may have dissociable roles and time scales for the recognition and discrimination of natural sounds.
The perception of sound textures, a class of natural sounds defined by statistical sound structure such as fire wind, and rain, has been proposed to arise through the integration of time-averaged summary statistics. Where and how the auditory system might encode these summary statistics to create internal representations of these stationary sounds, however, is unknown. Here, using natural textures and synthetic variants with reduced statistics, we show that summary statistics modulate the correlations between frequency organized neuron ensembles in the awake rabbit inferior colliculus. These neural ensemble correlation statistics capture high-order sound structure and allow for accurate neural decoding in a single trial recognition task with evidence accumulation times approaching 1 s. In contrast, the average activity across the neural ensemble (neural spectrum) provides a fast (tens of ms) and salient signal that contributes primarily to texture discrimination. Intriguingly, perceptual studies in human listeners reveals analogous trends: the sound spectrum is integrated quickly and serves as salient discrimination cue while high-order sound statistics are integrated slowly and contribute substantially more towards recognition. The findings suggest statistical sound cues such as the sound spectrum and correlation structure are represented by distinct response statistics in auditory midbrain ensembles, and that these neural response statistics may have dissociable roles and time scales for the recognition and discrimination of natural sounds.SIGNIFICANCE STATEMENTBeing able to recognize and discriminate natural sounds, such as from a running stream, a crowd clapping, or ruffling leaves is a critical task of the normal functioning auditory system. Humans can easily perform such tasks, yet they can be particularly difficult for the hearing impaired and they challenge our most sophisticated computer algorithms. This difficulty is attributed to the complex physical structure of such natural sounds and the fact they are not unique: they vary randomly in a statistically defined manner from one excerpt to the other. Here we provide the first evidence, to our knowledge, that the central auditory system is able to encode and utilize statistical sound cues for natural sound recognition and discrimination behaviors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.