A speech intelligibility prediction model is proposed that combines the auditory processing front end of the multi-resolution speech-based envelope power spectrum model [mr-sEPSM; Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134(1), 436–446] with a correlation back end inspired by the short-time objective intelligibility measure [STOI; Taal, Hendriks, Heusdens, and Jensen (2011). IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136]. This “hybrid” model, named sEPSMcorr, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.
Summary The present study proposes a modeling approach for predicting speech intelligibility for normal-hearing (NH) and hearing-impaired (HI) listeners in conditions of stationary and fluctuating interferers. The model combines a non-linear model of the auditory periphery with a decision process that is based on the contrast across characteristic frequency (CF) after modulation analysis in the range of the fundamental frequency of speech. Specifically the short-term across-CF correlation between noisy speech and noise alone is assumed to be inversely related to speech intelligibility. The model provided highly accurate predictions for NH listeners as well as largely plausible effects in response to changes in presentation level. Furthermore, the model could account for some of the main features in the HI data solely by adapting the peripheral model using a simplistic interpretation of the listeners’ hearing thresholds. The model’s predictive power may be substantially improved by refining the interpretation of the HI listeners’ profiles and the model may thus p rovide a valuable basis for quantitatively modeling effects of outer hair-cell and inner hair-cell loss on speech intelligibility.
Consonant-vowel (CV) perception experiments provide valuable insights into how humans process speech. Here, two CV identification experiments were conducted in a group of hearing-impaired (HI) listeners, using 14 consonants followed by the vowel /ɑ/. The CVs were presented in quiet and with added speech-shaped noise at signal-to-noise ratios of 0, 6, and 12 dB. The HI listeners were provided with two different amplification schemes for the CVs. In the first experiment, a frequency-independent amplification (flat-gain) was provided and the CVs were presented at the most-comfortable loudness level. In the second experiment, a frequency-dependent prescriptive gain was provided. The CV identification results showed that, while the average recognition error score obtained with the frequency-dependent amplification was lower than that obtained with the flat-gain, the main confusions made by the listeners on a token basis remained the same in a majority of the cases. An entropy measure and an angular distance measure were proposed to assess the highly individual effects of the frequency-dependent gain on the consonant confusions in the HI listeners. The results suggest that the proposed measures, in combination with a well-controlled phoneme speech test, may be used to assess the impact of hearing-aid signal processing on speech intelligibility.
Intelligibility models provide insights regarding the effects of target speech characteristics, transmission channels and/or auditory processing on the speech perception performance of listeners. In 2011, Jørgensen and Dau proposed the speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487]. It uses the signal-to-noise ratio in the modulation domain (SNRenv) as a decision metric and was shown to accurately predict the intelligibility of processed noisy speech. The sEPSM concept has since been applied in various subsequent models, which have extended the predictive power of the original model to a broad range of conditions. This contribution presents the most recent developments within the sEPSM “family:” (i) A binaural extension, the B-sEPSM [Chabot-Leclerc et al. (2016). J. Acoust. Soc. Am. 140(1), 192-205] which combines better-ear and binaural unmasking processes and accounts for a large variety of spatial phenomena in speech perception; (ii) a correlation-based version [Relaño-Iborra et al. (2016). J. Acoust. Soc. Am. 140(4), 2670-2679] which extends the predictions of the early model to non-linear distortions, such as phase jitter and binary mask-processing; and (iii) a recent physiologically inspired extension, which allows to functionally account for effects of individual hearing impairment on speech perception.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.