Pitch, the perceptual correlate of sound repetition rate or frequency, plays an important role in speech perception, music perception, and listening in complex acoustic environments. Despite the perceptual importance of pitch, the neural mechanisms that underlie it remain poorly understood. Although cortical regions responsive to pitch have been identified, little is known about how pitch information is extracted from the inner ear itself. The two primary theories of peripheral pitch coding involve stimulus-driven spike timing, or phase locking, in the auditory nerve (time code), and the spatial distribution of responses along the length of the cochlear partition (place code). To rule out the use of timing information, we tested pitch discrimination of very high-frequency tones (>8 kHz), well beyond the putative limit of phase locking. We found that high-frequency pure-tone discrimination was poor, but when the tones were combined into a harmonic complex, a dramatic improvement in discrimination ability was observed that exceeded performance predicted by the optimal integration of peripheral information from each of the component frequencies. The results are consistent with the existence of pitch-sensitive neurons that rely only on place-based information from multiple harmonically related components. The results also provide evidence against the common assumption that poor high-frequency pure-tone pitch perception is the result of peripheral neural-coding constraints. The finding that place-based spectral coding is sufficient to elicit complex pitch at high frequencies has important implications for the design of future neural prostheses to restore hearing to deaf individuals. The question of how pitch is represented in the ear has been debated for over a century. Two competing theories involve timing information from neural spikes in the auditory nerve (time code) and the spatial distribution of neural activity along the length of the cochlear partition (place code). By using very high-frequency tones unlikely to be coded via time information, we discovered that information from the individual harmonics is combined so efficiently that performance exceeds theoretical predictions based on the optimal integration of information from each harmonic. The findings have important implications for the design of auditory prostheses because they suggest that enhanced spatial resolution alone may be sufficient to restore pitch via such implants.
Sensitivity to TFS decreases with increasing age. The monaural and binaural TFS tests appear to reflect at least somewhat distinct auditory processes.
The frequency following response (FFR), a scalp-recorded measure of phase-locked brainstem activity, is often assumed to reflect the pitch of sounds as perceived by humans. In two experiments, we investigated the characteristics of the FFR evoked by complex tones. FFR waveforms to alternating-polarity stimuli were averaged for each polarity and added, to enhance envelope, or subtracted, to enhance temporal fine structure information. In experiment 1, frequency-shifted complex tones, with all harmonics shifted by the same amount in Hertz, were presented diotically. Only the autocorrelation functions (ACFs) of the subtraction-FFR waveforms showed a peak at a delay shifted in the direction of the expected pitch shifts. This expected pitch shift was also present in the ACFs of the output of an auditory nerve model. In experiment 2, the components of a harmonic complex with harmonic numbers 2, 3, and 4 were presented either to the same ear (“mono”) or the third harmonic was presented contralaterally to the ear receiving the even harmonics (“dichotic”). In the latter case, a pitch corresponding to the missing fundamental was still perceived. Monaural control conditions presenting only the even harmonics (“2 + 4”) or only the third harmonic (“3”) were also tested. Both the subtraction and the addition waveforms showed that (1) the FFR magnitude spectra for “dichotic” were similar to the sum of the spectra for the two monaural control conditions and lacked peaks at the fundamental frequency and other distortion products visible for “mono” and (2) ACFs for “dichotic” were similar to those for “2 + 4” and dissimilar to those for “mono.” The results indicate that the neural responses reflected in the FFR preserve monaural temporal information that may be important for pitch, but provide no evidence for any additional processing over and above that already present in the auditory periphery, and do not directly represent the pitch of dichotic stimuli.
Pitch plays a crucial role in speech and music, but is highly degraded for people with cochlear implants, leading to severe communication challenges in noisy environments. Pitch is determined primarily by the first few spectrally resolved harmonics of a tone. In implants, access to this pitch is limited by poor spectral resolution, due to the limited number of channels and interactions between adjacent channels. Here we used noise-vocoder simulations to explore how many channels, and how little channel interaction, are required to elicit pitch. Results suggest that two to four times the number of channels are needed, along with interactions reduced by an order of magnitude, than available in current devices. These new constraints not only provide insights into the basic mechanisms of pitch coding in normal hearing but also suggest that spectrally based complex pitch is unlikely to be generated in implant users without significant changes in the method or site of stimulation.
Cochlear implant (CI) listeners typically perform poorly on tasks involving the pitch of complex tones. This limitation in performance is thought to be mainly due to the restricted number of active channels and the broad current spread that leads to channel interactions and subsequent loss of precise spectral information, with temporal information limited primarily to temporal-envelope cues. Little is known about the degree of spectral resolution required to perceive combinations of multiple pitches, or a single pitch in the presence of other interfering tones in the same spectral region. This study used noise-excited envelope vocoders that simulate the limited resolution of CIs to explore the perception of multiple pitches presented simultaneously. The results show that the resolution required for perceiving multiple complex pitches is comparable to that found in a previous study using single complex tones. Although relatively high performance can be achieved with 48 channels, performance remained near chance when even limited spectral spread (with filter slopes as steep as 144 dB/octave) was introduced to the simulations. Overall, these tight constraints suggest that current CI technology will not be able to convey the pitches of combinations of spectrally overlapping complex tones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.