2020
DOI: 10.3390/app10093230
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection

Abstract: In this paper, a multi-task learning U-shaped neural network (MTU-Net) is proposed and applied to single-channel speech enhancement (SE). The proposed MTU-based SE method estimates an ideal binary mask (IBM) or an ideal ratio mask (IRM) by extending the decoding network of a conventional U-Net to simultaneously model the speech and noise spectra as the target. The effectiveness of the proposed SE method was evaluated under both matched and mismatched noise conditions between training and testing by measuring t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(15 citation statements)
references
References 37 publications
0
15
0
Order By: Relevance
“…Similar approaches could be used to remove noise. For example, spectral gating could be extended with neural networks by training a neural network to learn a mask to gate away background noise and recover the lower-noise spectrogram, as has been done in speech enhancement applications (Wang and Chen, 2018 ; Lee and Kim, 2020 ).…”
Section: Signal Processing and Denoisingmentioning
confidence: 99%
“…Similar approaches could be used to remove noise. For example, spectral gating could be extended with neural networks by training a neural network to learn a mask to gate away background noise and recover the lower-noise spectrogram, as has been done in speech enhancement applications (Wang and Chen, 2018 ; Lee and Kim, 2020 ).…”
Section: Signal Processing and Denoisingmentioning
confidence: 99%
“…Voice activity detection (VAD) is a technique for detecting the presence of speech signal in speech data [22]. It has been widely used to enhance the speech contents such as speech classification [23], speaker recognition [24], and speech enhancement [25,26]. Figure 4 shows three processing steps for VAD: (1) noise reduction, (2) segmentation, and (3) elimination [27].…”
Section: Voice Activity Detectionmentioning
confidence: 99%
“…1b). Lee et al [16] used U-Net to estimate clean speech spectra and noise spectra simultaneously, and then used the enhanced speech spectrogram to conduct VAD directly by thresholding. Jung et al [17] used the output and latent variable of a denoising variational autoencoder-based SE module as the input of VAD.…”
Section: Introductionmentioning
confidence: 99%