Strelcyk et al. (2019) recently found that interaural phase discrimination in older hearing-impaired listeners was correlated with both visuospatial processing speed and interaural level discrimination. This suggests that temporal fine structure (TFS) processing relies on global processing speed and/or spatial cognition, though it is possible that, generally, complex auditory discrimination engages multiple cognitive domains. Here, 50 Veterans (mean age = 48.1, range = 30–60) with normal or near-normal hearing completed batteries of temporal processing and cognitive tests. Composite cognitive test scores reflecting processing speed/executive function (PS-EX) and working memory (WM) were obtained. Temporal processing tasks included measures of envelope (ENV; gap duration discrimination, forward masking) and TFS (frequency modulation detection, interaural phase modulation detection) processing. Bayesian hierarchical regression was used to fit psychometric functions simultaneously to all ENV and TFS tasks in all subjects. Fixed effects of PS-EX and WM on thresholds and slopes were estimated for the psychometric functions in each temporal task. In general, (i) PS-EX and WM influenced both TFS and ENV thresholds, but not slopes; and (ii) TFS thresholds were best explained by PS-EX scores while ENV thresholds were best explained by WM scores. These findings suggest a specific relation between global processing speed and TFS.
According to signal detection theory, the ability to detect a signal is limited only by internal noise, which comprises peripheral and central sources. Here, we develop a statistical approach to parse central from peripheral noise. Fifty-two Veterans (mean age = 47.8, range = 30–60) with normal or near-normal hearing performed AXB discrimination for several temporal processing tasks: gap duration discrimination, forward masking, frequency modulation detection, and interaural phase modulation detection. After training, a single adaptive run (40 reversals) was completed for each task. Subjects also completed speech-in-noise testing (“Theo-Victor-Michael") with four masker types (48 trials ea.): speech-shaped noise, speech-envelope modulated noise, one and two competing talkers. Composite speech performance was estimated using principal component analysis. Bayesian hierarchical regression was used to estimate two-parameter psychometric functions (threshold, slope) simultaneously for all temporal tasks and subjects. Crucially, fixed (group-level) thresholds were estimated per task but only a single random (subject-level) intercept was estimated (mean across-task deviation from the group thresholds). We assume central noise is the primary factor limiting across-task performance. The principal speech scores were entered as regressors on this “central threshold.” Indeed, central threshold was correlated with the principal speech scores, suggesting that central noise limits temporal processing and speech-in-noise.
Recent studies suggest the brain tracks both attended and unattended speech streams. Here, we describe the cortical mechanisms that support active talker segregation by vocal gender. Thirty-three participants with normal or near-normal hearing performed a competing speech task during fMRI scanning. The target (competing) talker was female (male). Spectrotemporal modulation filtering was applied to stochastically modulate female and male vocal pitch across trials. Using the modulation-filter patterns as predictors, spectrotemporal receptive fields (STRFs) were obtained at each voxel using coordinate descent. STRF weights associated with female- (∼6 cyc/kHz) and male-talker (∼12 cyc/kHz) pitch were analyzed across subjects to identify pitch-sensitive voxels (logical OR, corrected p < 0.01), which were then characterized by preference for female vs. male. Anterior regions in Heschl’s gyrus and the superior temporal gyrus (STG) responded best to the female talker, while posterior regions in STG and planum temporale (PT) responded best to the male talker. In a control task where the talkers did not compete, the same pattern was observed but the posterior network shifted from STG to PT and responded to the acoustic boundary between talkers (∼9 cyc/kHz), suggesting that acoustically coded pitch in PT becomes voice-coded in STG during active segregation.
Clinical speech-in-noise tests typically use materials without contextual constraint or balanced for linguistic properties like word/phoneme frequency. However, real-world linguistic context effects can be substantial and vary by listener and scenario. Here, 38 participants completed the Theo-Victor-Michael (TVM) speech test in four types of background: speech shaped noise (SSN), speech-envelope modulated noise (envSSN), one competing talker (1T), and two competing talkers (2T) (Helfer and Freyman, 2009). The TVM is a matrix test using keywords from a corpus of one- and two- syllable nouns that vary considerably in word frequency (FREQ) and phonological neighborhood density (DENS). Bayesian logistic regression was used to estimate the effects of FREQ/DENS on TVM performance. A multinomial model was used for 1T/2T to assess reporting of target and distractor keywords. Overall, percent-correct recognition increased with increasing keyword FREQ and decreased with increasing keyword DENS. Effects were larger in SSN/envSSN than 1T/2T. Statistically significant but small effects of FREQ/DENS were observed on distractor responses in 1T/2T. Adjusting performance for FREQ/DENS substantially shifted the distribution of scores but only for SSN/envSSN. Performance in 1T/2T may be dominated by non-linguistic factors, and/or less sensitive to FREQ/DENS due to higher difficulty or linguistic competition from the background talkers. [Work supported by VA RR&D Service.]
Previous studies have struggled to identify measures beyond the audiogram to reliably predict speech-in-noise scores. This may owe to: (i) different mechanisms mediate performance depending on materials and task; and (ii) effects are not reproducible. Here, 38 listeners with normal/near-normal audiograms completed batteries of temporal auditory and cognitive tests, and speech recognition (“Theo-Victor-Michael” test) in speech-shaped noise (SSN), speech-envelope modulated noise (envSSN), one (1T) and two (2T) competing talkers. A two-stage Bayesian modeling approach was employed. In Stage 1, speech scores were corrected for target-word frequency/neighborhood density, psychometric function parameters were extracted from temporal tests, and cognitive measures were reduced to three composite variables. Stage 2 then applied Gaussian process models to predict speech scores from temporal and cognitive measures. Leave-one-out cross-validation and model stacking determined the best combination of predictive models. Performance in SSN/envSSN was best predicted by temporal envelope measures (forward masking, gap duration discrimination), while performance in 1T was best predicted by cognitive measures (executive function, processing speed). Temporal fine structure measures (frequency-modulation, interaural-phase-difference detection) predicted the number of 1T distractor responses. All models failed on 2T. These results show that prediction of speech-in-noise scores from suprathreshold “process” measures is highly task dependent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.