2022
DOI: 10.1109/taslp.2022.3165442
|View full text |Cite
|
Sign up to set email alerts
|

Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

Abstract: Speech enhancement employing deep neural networks (DNNs) for denoising is called deep noise suppression (DNS). The DNS trained with mean squared error (MSE) losses cannot guarantee good perceptual quality. Perceptual evaluation of speech quality (PESQ) is a widely used metric for evaluating speech quality. However, the original PESQ algorithm is non-differentiable, therefore, cannot directly be used as optimization criterion for gradient-based learning. In this work, we propose an end-to-end non-intrusive PESQ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 63 publications
0
11
0
Order By: Relevance
“…Both versions of DNSMOS require the input speech signals having a fixed length of nine seconds. In our recent works [21], [22], [24], we proposed an end-to-end PESQNet for DNS applications, adapted from a BLSTM-based speech emotion recognition DNN [35], to predict PESQ scores of the enhanced speech signal. In these works, the trained PESQNet is employed as a mediator to provide a differentiable PESQ loss during a speech enhancement DNN training, aiming at maximizing the PESQ score of the enhanced speech signal.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Both versions of DNSMOS require the input speech signals having a fixed length of nine seconds. In our recent works [21], [22], [24], we proposed an end-to-end PESQNet for DNS applications, adapted from a BLSTM-based speech emotion recognition DNN [35], to predict PESQ scores of the enhanced speech signal. In these works, the trained PESQNet is employed as a mediator to provide a differentiable PESQ loss during a speech enhancement DNN training, aiming at maximizing the PESQ score of the enhanced speech signal.…”
Section: Introductionmentioning
confidence: 99%
“…As concerns topology, we build upon [22], but many changes are required for the DNN to serve the speech communication monitoring needs targeted in this work: (1) Compared to PESQNet, the novel PESQ-DNN employs a complex spectrogram as input to explicitly consider phase influences in the perceived speech quality. Except for a few works, e.g., WaweNet, most speech quality prediction DNNs employ amplitude or power spectrogram input, leading to the problem that speech quality degradations caused by phase distortions cannot be measured.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Machine-learning-based methods have been proposed to eliminate the dependence on clean speech references during inference and can be further divided into two categories. The first attempts to non-intrusively estimate the objective scores mentioned above (Fu et al, 2018;Dong & Williamson, 2020;Zezario et al, 2020;Catellier & Voran, 2020;Yu et al, 2021b;Xu et al, 2022;Kumar et al, 2023). However, during training, noisy/processed and clean speech pairs are still required to obtain the objective scores as model targets.…”
Section: Introductionmentioning
confidence: 99%
“…However, such objective functions must be carefully designed as many objective measures contain calculations that are non-differentiable. Several systems circumvent this limitation via use of an additional model that mimics the behaviour of the metric [12]- [14], with this network being used as a surrogate of the metric used as an objective function in training of the speech enhancement model. The baseline system that this work builds upon is one such system, MetricGAN+ [15] (itself an extension of previous work MetricGAN [16]).…”
Section: Introductionmentioning
confidence: 99%