2017
DOI: 10.1109/taslp.2016.2628641
|View full text |Cite
|
Sign up to set email alerts
|

Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
109
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 187 publications
(114 citation statements)
references
References 53 publications
5
109
0
Order By: Relevance
“…The third loss function we consider is based on the short-time objective intelligibility (STOI) speech intelligibility estimator [32]. STOI is currently the, perhaps, most commonly used speech intelligibility estimator for objectively evaluating the performance of speech enhancement systems [6], [7], [9], [13]. This is presumably driven by the fact that STOI has proven to be able to quite accurately predict the intelligibility of noisy/processed speech in a large range of acoustic scenarios, including ideal time-frequency weighted noisy speech [32] and noisy speech enhanced by single-microphone time-frequency weighting-based speech enhancement systems [32] (se also [33], [56]).…”
Section: Short-time Objective Intelligibilitymentioning
confidence: 99%
See 1 more Smart Citation
“…The third loss function we consider is based on the short-time objective intelligibility (STOI) speech intelligibility estimator [32]. STOI is currently the, perhaps, most commonly used speech intelligibility estimator for objectively evaluating the performance of speech enhancement systems [6], [7], [9], [13]. This is presumably driven by the fact that STOI has proven to be able to quite accurately predict the intelligibility of noisy/processed speech in a large range of acoustic scenarios, including ideal time-frequency weighted noisy speech [32] and noisy speech enhanced by single-microphone time-frequency weighting-based speech enhancement systems [32] (se also [33], [56]).…”
Section: Short-time Objective Intelligibilitymentioning
confidence: 99%
“…This subset of WSJ0 consists in total of 11613 utterances approximately equally divided among 44 male speakers and 47 female speakers. This ensures that the training dataset contains a large speaker variability, which allows the final speech enhancement system to be largely speaker independent [9].…”
Section: A Noise-free Speech Mixturesmentioning
confidence: 99%
“…In [54] it is found that the AFE enhancement method outperforms MMSE-based methods for noise-robust speech recognition. Recently DNN based speech enhancement methods have also been proposed for improving speech intelligibility [55], automatic speech recognition [56] and speaker verification [57,58].…”
Section: Robust Vad In Noisementioning
confidence: 99%
“…However, a significant amount of research has focused on the offline setting with many algorithms being unsuitable for real-time use due to batch processing or computational requirements. Recent speech enhancement and source separation approaches based on deep neural networks offer impressive performance gains compared with traditional real-time signal processing methods [1]- [4], however these methods tend to be computationally demanding, preventing their use in low-power devices, and often rely on future information, preventing their use in realtime systems. Deep learning methods also require a significant S. Wood and J. Rouat are affiliated with NECOTIS, Department of Electrical and Computer Engineering, University of Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada.…”
Section: Introductionmentioning
confidence: 99%