2016
DOI: 10.1121/1.4952439
|View full text |Cite
|
Sign up to set email alerts
|

Outcome measures based on classification performance fail to predict the intelligibility of binary-masked speech

Abstract: To date, the most commonly used outcome measure for assessing ideal binary mask estimation algorithms is based on the difference between the hit rate and the false alarm rate (H-FA). Recently, the error distribution has been shown to substantially affect intelligibility. However, H-FA treats each mask unit independently and does not take into account how errors are distributed. Alternatively, algorithms can be evaluated with the short-time objective intelligibility (STOI) metric using the reconstructed speech.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
3
1

Year Published

2016
2016
2018
2018

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 21 publications
1
3
1
Order By: Relevance
“…In general, these results suggest that the STOI tends to underestimate human performance for unprocessed reverberant-noisy speech and overestimate human performance for processed speech. The current observation is somewhat different from those for noisy speech enhancement, for which the STOI overpredicts intelligibility for both unprocessed and processed signals (Healy et al, 2015;Kressner et al, 2016). Figure 10 compares ESTOI-predicted recognition scores and actual recognition scores.…”
Section: B Objective Measures Of Intelligibilitycontrasting
confidence: 56%
“…In general, these results suggest that the STOI tends to underestimate human performance for unprocessed reverberant-noisy speech and overestimate human performance for processed speech. The current observation is somewhat different from those for noisy speech enhancement, for which the STOI overpredicts intelligibility for both unprocessed and processed signals (Healy et al, 2015;Kressner et al, 2016). Figure 10 compares ESTOI-predicted recognition scores and actual recognition scores.…”
Section: B Objective Measures Of Intelligibilitycontrasting
confidence: 56%
“…A recent study highlighted potential limitations of STOI in predicting the intelligibility of binary-masked speech [22]. Two observations from this study support these findings.…”
Section: Discussionsupporting
confidence: 83%
“…The value range of STOI is typically between 0 and 1, which can be interpreted as percent correct. Although STOI tends to overpredict intelligibility scores [64], [102], no alternative metric has been shown to consistently correlate with human intelligibility better. For speech quality, PESQ (perceptual evaluation of speech quality) is the standard metric [140] and recommended by the International Telecommunication Union (ITU) [87].…”
Section: Training Targetsmentioning
confidence: 99%