2019
DOI: 10.3390/app9112289
|View full text |Cite
|
Sign up to set email alerts
|

Disentangled Feature Learning for Noise-Invariant Speech Enhancement

Abstract: Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 44 publications
0
2
0
Order By: Relevance
“…PESQ estimates the subjective mean opinion score for a group of normal-hearing listeners regarding the perceived audio quality over telephone networks, when degraded by speech or noise distortions. It ranges from −0.5 (or 1.0 in most cases) to 4.5 and is widely used to assess speech processing algorithms [2,21,56,63], indicating the speech quality measurement of enhanced speech.…”
Section: Objective Evaluation Criteriamentioning
confidence: 99%
See 1 more Smart Citation
“…PESQ estimates the subjective mean opinion score for a group of normal-hearing listeners regarding the perceived audio quality over telephone networks, when degraded by speech or noise distortions. It ranges from −0.5 (or 1.0 in most cases) to 4.5 and is widely used to assess speech processing algorithms [2,21,56,63], indicating the speech quality measurement of enhanced speech.…”
Section: Objective Evaluation Criteriamentioning
confidence: 99%
“…Recently, Lang and Yang (2020) [20] demonstrated the effectiveness of fusing complementary features to magnitude-aware targets by separately learning phase representations. In addition, Bae et al (2019) [21] explored a framework for disentangling speech and noise for noise-invariant speech enhancement, offering more robust noise-invariant properties. In Rao and Carney (2014) [22], a vowel enhancement strategy is proposed to restore the representation of formants at the level of the midbrain by performing formant tracking and enhancement.…”
Section: Introductionmentioning
confidence: 99%