2021
DOI: 10.1186/s13636-021-00207-6
|View full text |Cite
|
Sign up to set email alerts
|

Components loss for neural networks in mask-based speech enhancement

Abstract: Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel components loss (CL) for the training of neural networks for mask-based speech enhancement. During the training process, the proposed CL offers separate control over preservation of the speech component quality, suppression of the noise component, and preservation of a naturally sounding residual noise com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 43 publications
1
9
0
Order By: Relevance
“…Our previous work denoted as "FCRN/PESQNet, [24]" achieves two 1 st -ranked metrics and one 2 nd rank and significantly outperforms the DNS3 baseline [49] in speech quality measured by PESQ. Under both reverberation conditions, the components loss baseline "FCRN/CL [50]" offers around 0.1 points higher PESQ scores compared to the DNS3 baseline [49], but does not perform so well on DNSMOS. Furthermore, the CL baseline "FCRN/CL [50]" offers the worst dereverberation effects reflected by the lowest SRMR scores among all the baseline methods.…”
Section: ) Hyperparameter Optimization and Analysismentioning
confidence: 91%
See 3 more Smart Citations
“…Our previous work denoted as "FCRN/PESQNet, [24]" achieves two 1 st -ranked metrics and one 2 nd rank and significantly outperforms the DNS3 baseline [49] in speech quality measured by PESQ. Under both reverberation conditions, the components loss baseline "FCRN/CL [50]" offers around 0.1 points higher PESQ scores compared to the DNS3 baseline [49], but does not perform so well on DNSMOS. Furthermore, the CL baseline "FCRN/CL [50]" offers the worst dereverberation effects reflected by the lowest SRMR scores among all the baseline methods.…”
Section: ) Hyperparameter Optimization and Analysismentioning
confidence: 91%
“…In [50], a components loss (CL) was proposed for training a mask-based speech enhancement neural network, which offers separate controls over preservation of the speech component quality, suppression of the noise component, and preservation of a natural sounding residual noise component. The experimental results of [50] show improved and balanced performance compared to the conventional MSE loss, the approximated differentiable PESQ loss proposed in [28], and the perceptual weighting filter loss proposed in [30], which is based on code-excited linear predictive (CELP) speech coding. We fine-tune the pre-trained DNS model employing CL on D train DNS3 .…”
Section: ) Components Loss Baselinementioning
confidence: 99%
See 2 more Smart Citations
“…The network is trained using either ideal binary mask (IBM) or ideal ratio mask (IRM) as training targets [ 20 , 21 ]. Typically, the networks are trained using the mean squared error (MSE) either on the masks or on the reconstructed signal [ 22 , 23 ]. Despite the promising performance achievable in terms of SDR and intelligibility, the presence of artifacts in the reconstructed signals compromises the performance of further processing stages.…”
Section: Introductionmentioning
confidence: 99%