An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Taal, Cees H.; Hendriks, Richard C.; Heusdens, Richard; Jensen, Jesper

doi:10.1109/tasl.2011.2114881

Cited by 1,831 publications

(992 citation statements)

References 28 publications

Supporting

Mentioning

966

Contrasting

Unclassified

Order By: Relevance

“…Especially popular measures are STOI [9], PESQ [10], or the word error rates of speech recognition systems. PESQ was originally designed as a measure for speech quality rather than intelligibility, but was then found to also correlate reasonably well with subjective intelligibility [11].…”

Section: Introductionmentioning

confidence: 99%

“…PESQ was originally designed as a measure for speech quality rather than intelligibility, but was then found to also correlate reasonably well with subjective intelligibility [11]. None of today's objective measures of intelligibility can perfectly predict intelligibility to humans, and their correlation depends on the type of speech degradation present [9,12].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement

Gelderblom¹,

Tronstad²,

Viggen³

2017

Interspeech 2017

View full text Add to dashboard Cite

Recent literature indicates increasing interest in deep neural networks for use in speech enhancement systems. Currently, these systems are mostly evaluated through objective measures of speech quality and/or intelligibility. Subjective intelligibility evaluations of these systems have so far not been reported. In this paper we report the results of a speech recognition test with 15 participants, where the participants were asked to pick out words in background noise before and after enhancement using a common deep neural network approach. We found that, although the objective measure STOI predicts that intelligibility should improve or at the very least stay the same, the speech recognition threshold, which is a measure of intelligibility, deteriorated by 4 dB. These results indicate that STOI is not a good predictor for the subjective intelligibility of deep neural network-based speech enhancement systems. We also found that the postprocessing technique of global variance normalisation does not significantly affect subjective intelligibility.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement

Gelderblom¹,

Tronstad²,

Viggen³

2017

Interspeech 2017

View full text Add to dashboard Cite

show abstract

“…Although these measures are designed to model human hearing, many of the most successful ones are based on principles that apply equally well to machine speech recognition. For example, in recent years, a simple algorithm known as the Short-Time Objective Intelligibility (STOI) measure has been shown to be a good predictor of intelligibility in a wide range of applications including time-frequency weighted noisy speech (Taal et al, 2011). The STOI measure is based on the sum of the correlation between the envelopes of the clean speech signal and the corrupted speech measured with 15 1/3-octave frequency bands starting at 150 Hz.…”

Section: Objective Intelligibility Measuresmentioning

confidence: 99%

The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes

Barker

Marxer

Vincent

et al. 2017

Computer Speech & Language

103

View full text Add to dashboard Cite

This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.

show abstract

“…The same reconstruction procedures were used in [15] [17]. Table 2 summarizes the comparisons of the wide-matching method against five conventional speech enhancement methods on the Segmental SNR, PESQ and STOI [21] measures, respectively, as a function of the input test sentence SNR averaged over 1152 test sentences (i.e., 192 test sentences per noise type × 6 noise types) under each SNR condition. The wide-matching method did not use any noise estimation while the conventional methods, Log-MMSE [22], LogMMSE-SPU [23], Wiener filtering [24], KLT [25] and Perceptual KLT [26], each used an algorithm to estimate the noise.…”

Section: Experimental Studiesmentioning

confidence: 99%

Wide matching — An approach to improving noise robustness for speech enhancement

Crookes

2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Cited by 1,831 publications

References 28 publications

Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement

Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement

The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes

Wide matching — An approach to improving noise robustness for speech enhancement

Contact Info

Product

Resources

About