2018
DOI: 10.48550/arxiv.1811.11307
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improved Speech Enhancement with the Wave-U-Net

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(59 citation statements)
references
References 0 publications
0
59
0
Order By: Relevance
“…Following previous works of speech enhancement [24,12,25], we apply Perceptual evaluation of speech quality (PESQ) [26], Mean opinion score (MOS) predictor of signal distortion (CSIG), MOS predictor of background-noise intrusiveness (CBAK), MOS predictor of overall signal quality (COVL) [27] and segmental signal-to-ratio noise (SSNR) [28] to evaluate the speech enhancement performance. Table 1 shows that noisy speech without enhancement achieves PESQ, CSIG, CBAK, COVL, SSNR of 1.97, 3.35, 2.44, 2.63 and 1.68 dB respectively.…”
Section: Methodsmentioning
confidence: 99%
“…Following previous works of speech enhancement [24,12,25], we apply Perceptual evaluation of speech quality (PESQ) [26], Mean opinion score (MOS) predictor of signal distortion (CSIG), MOS predictor of background-noise intrusiveness (CBAK), MOS predictor of overall signal quality (COVL) [27] and segmental signal-to-ratio noise (SSNR) [28] to evaluate the speech enhancement performance. Table 1 shows that noisy speech without enhancement achieves PESQ, CSIG, CBAK, COVL, SSNR of 1.97, 3.35, 2.44, 2.63 and 1.68 dB respectively.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, word error rate (WER) is also computed to assess the effects of the enhancement for speech recognition purposes. For this purpose, we use a Wav2Vec [28] architecture pre-trained on Librispeech 960h 7 . The final metric for this task is a combination of these two measures given by (ST OI + (1 − W ER))/2.…”
Section: Task 1: 3d Speech Enhancement In Office Reverberant Environmentmentioning
confidence: 99%
“…Neural beamforming techniques as Filter and Sum Networks (FaS-Net) [5] provide state-of-the art results for Ambisonics-based SE and are usually suitable for low-latency scenarios. Also U-Net-based approaches provide competitive results in this context, both for monaural [6,7] and multichannel SE tasks [8], at the expense of higher computational power demand. Other techniques to perform SE include recurrent neural networks (RNNs) [9], graph-based spectral subtraction [10], discriminative learning [11], dilated convolutions [12,13].…”
Section: Introductionmentioning
confidence: 99%
“…U-Net was first introduced on image segmentation and attained several state-of-the-art results [19]. Recently, Wave-U-Net was proposed by [20] to improve audio source separation and speech enhancement [21]. However, the previous U-Net-based methods did not consider the sequenceto-sequence mechanism such as temporal dependency.…”
Section: Model Defense By U-net Based Speech Enhancementmentioning
confidence: 99%