Traditional noise-suppression algorithms have been shown to improve speech quality, but not speech intelligibility. Motivated by prior intelligibility studies of speech synthesized using the ideal binary mask, an algorithm is proposed that decomposes the input signal into time-frequency ͑T-F͒ units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target or the masker. Speech corrupted at low signal-to-noise ratio ͑SNR͒ levels ͑−5 and 0 dB͒ using different types of maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility ͑over 60% points in −5 dB babble͒ over that attained by human listeners with unprocessed stimuli. The findings from this study suggest that algorithms that can estimate reliably the SNR in each T-F unit can improve speech intelligibility.
The articulation index ͑AI͒, speech-transmission index ͑STI͒, and coherence-based intelligibility metrics have been evaluated primarily in steady-state noisy conditions and have not been tested extensively in fluctuating noise conditions. The aim of the present work is to evaluate the performance of new speech-based STI measures, modified coherence-based measures, and AI-based measures operating on short-term ͑30 ms͒ intervals in realistic noisy conditions. Much emphasis is placed on the design of new band-importance weighting functions which can be used in situations wherein speech is corrupted by fluctuating maskers. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech ͑consonants and sentences͒ corrupted by four different maskers ͑car, babble, train, and street interferences͒. Of all the measures considered, the modified coherence-based measures and speech-based STI measures incorporating signal-specific band-importance functions yielded the highest correlations ͑r = 0.89-0.94͒. The modified coherence measure, in particular, that only included vowel/consonant transitions and weak consonant information yielded the highest correlation ͑r = 0.94͒ with sentence recognition scores. The results from this study clearly suggest that the traditional AI and STI indices could benefit from the use of the proposed signal-and segment-dependent band-importance functions.
Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests.
Unlike prior studies with bilateral cochlear implant users which considered only one interferer, the present study considered realistic listening situations wherein multiple interferers were present and in some cases originating from both hemifields. Speech reception thresholds were measured in bilateral users unilaterally and bilaterally in four different spatial configurations, with one and three interferers consisting of modulated noise and competing talkers. The data were analyzed in terms of binaural benefits including monaural advantage (better-ear listening) and binaural interaction. The total advantage (overall spatial release) received was 2-5 dB and was maintained with multiple interferers present. This advantage was dominated by the monaural advantage, which ranged from 1-6 dB and was largest when the interferers were mostly energetic. No binaural interaction benefit was found in the present study with either type of interferer (speech or noise). While the total and monaural advantage obtained for noise interferers was comparable to that attained by normal-hearing listeners, it was considerably lower for speech interferers. This suggests that bilateral users are less capable of taking advantage of binaural cues, in particular under conditions of informational masking. Furthermore, the use of noise interferers does not adequately reflect the difficulties experienced by bilateral users in real-life situations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.