Two objective measures of human cochlear tuning, using stimulus-frequency otoacoustic emissions (SFOAE), have been proposed. One measure used SFOAE phase-gradient delay and the other twotone suppression (2TS) tuning curves. Here, it is hypothesized that the two measures lead to different frequency functions in the same listener. Two experiments were conducted in ten young adult normal-hearing listeners in three frequency bands (1-2 kHz, 3-4 kHz and 5-6 kHz). Experiment 1 recorded SFOAE latency as a function of stimulus frequency, and experiment 2 recorded 2TS isoinput tuning curves. In both cases, the output was converted into a sharpness-of-tuning factor based on the equivalent rectangular bandwidth. In both experiments, sharpness-of-tuning curves were shown to be frequency dependent, yielding sharper relative tuning with increasing frequency. Only a weak frequency dependence of the sharpness-of-tuning curves was observed for experiment 2, consistent with objective and behavioural estimates from the literature. Most importantly, the absolute difference between the two tuning estimates was very large and statistically significant. It is argued that the 2TS estimates of cochlear tuning likely represents the underlying properties of the suppression mechanism, and not necessarily cochlear tuning. Thus the phase-gradient delay estimate is the most likely one to reflect cochlear tuning.
Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Computational speech segregation aims to automatically segregate speech from interfering noise, often by employing ideal binary mask estimation. Several studies have tried to exploit contextual information in speech to improve mask estimation accuracy by using two frequently-used strategies that (1) incorporate delta features and (2) employ support vector machine (SVM) based integration. In this study, two experiments were conducted. In Experiment I, the impact of exploiting spectro-temporal context using these strategies was investigated in stationary and six-talker noise. In Experiment II, the delta features were explored in detail and tested in a setup that considered novel noise segments of the six-talker noise. Computing delta features led to higher intelligibility than employing SVM based integration and intelligibility increased with the amount of spectral information exploited via the delta features. The system did not, however, generalize well to novel segments of this noise type. Measured intelligibility was subsequently compared to extended short-term objective intelligibility, hit-false alarm rate, and the amount of mask clustering. None of these objective measures alone could account for measured intelligibility. The findings may have implications for the design of speech segregation systems, and for the selection of a cost function that correlates with intelligibility.
The advanced combination encoder (ACE™) is an established speech-coding strategy in cochlear-implant processing that selects a number of frequency channels based on amplitudes. However, speech intelligibility outcomes with this strategy are limited in noisy conditions. To improve speech intelligibility, either noise-dominant channels can be attenuated prior to ACE™ with noise reduction or, alternatively, channels can be selected based on estimated signal-to-noise ratios. A noise power estimation stage is, therefore, required. This study investigated the impact of noise power estimation in noise-reduction and channel-selection strategies. Results imply that estimation with improved noise-tracking capabilities does not necessarily translate into increased speech intelligibility.
The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation performance, the error distributions and intelligibility predictions. Results indicated that it could be problematic to exploit context in the back-end, even though such a spectro-temporal integration stage improves the segregation performance. Also, the results emphasized the potential need of a single metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.