2023
DOI: 10.1109/taslp.2023.3265202
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Uncertainty Into Neural Network-Based Speech Enhancement

Abstract: Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former refers to inherent randomness in data, while the latter descri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 64 publications
0
3
0
Order By: Relevance
“…Some researchers have used mapping targets in addition to the spectrum or mask (Xu et al, 2017; Fang et al, 2023). Xu et al (2017) proposed a DNN that learned the magnitude spectrum as the primary target and MFCC as the secondary target.…”
Section: Deep Learning Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Some researchers have used mapping targets in addition to the spectrum or mask (Xu et al, 2017; Fang et al, 2023). Xu et al (2017) proposed a DNN that learned the magnitude spectrum as the primary target and MFCC as the secondary target.…”
Section: Deep Learning Methodsmentioning
confidence: 99%
“…The additional MFCC estimation imposed constraints that were not applicable in the prediction of the magnitude spectrum alone, improving the prediction performance of the primary target. Fang et al (2023) proposed a framework for jointly modeling random uncertainties and uncertainties due to insufficient training data for deep-learning-based Wiener filter estimation for speech enhancement. The involvement of modeling uncertainties increased the robustness of the estimator, and it was shown that this method preserved more speech at the cost of decreasing the amount of noise reduction slightly.…”
Section: Deep Learning Methodsmentioning
confidence: 99%
See 1 more Smart Citation