Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2423
|View full text |Cite
|
Sign up to set email alerts
|

A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement

Abstract: Speech enhancement under highly non-stationary noise conditions remains a challenging problem. Classical methods typically attempt to identify a frequency-domain optimal gain function that suppresses noise in noisy speech. These algorithms typically produce artifacts such as "musical noise" that are detrimental to machine and human understanding, largely due to inaccurate estimation of noise power spectra. The optimal gain function is commonly referred to as the ideal ratio mask (IRM) in neural-network-based s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 18 publications
(34 reference statements)
0
6
0
Order By: Relevance
“…During experiments, we notice that even though our systems trained on MSE (e.g. row 4 in Table 1) could achieve similar objective measures compared to those trained on the proposed weighted losses (12), the corresponding subjective quality of systems trained on the weighted loss is a lot better. The most noticeable improvement of systems trained on our loss functions, especially with small α, is that the estimated gain function is much more frequency-selective than systems trained on regular MSE, resulting in higher noise suppression, especially at high SNRs.…”
Section: Methodsmentioning
confidence: 94%
See 2 more Smart Citations
“…During experiments, we notice that even though our systems trained on MSE (e.g. row 4 in Table 1) could achieve similar objective measures compared to those trained on the proposed weighted losses (12), the corresponding subjective quality of systems trained on the weighted loss is a lot better. The most noticeable improvement of systems trained on our loss functions, especially with small α, is that the estimated gain function is much more frequency-selective than systems trained on regular MSE, resulting in higher noise suppression, especially at high SNRs.…”
Section: Methodsmentioning
confidence: 94%
“…In this work, we study real-time speech enhancement with recurrent neural network (RNN). Recent works involving RNNs demonstrated promising results [10], even at very low signal-to-noise ratio (SNR) scenarios [11,12].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Unlike previous a priori SNR estimators, the proposed estimators do not require a noise estimator. Recently, a recurrent neural network (RNN) was used to aid the DD approach in a priori SNR estimation [15]. The proposed estimators differ by directly estimating the a priori SNR.…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…The proposed a priori SNR estimators significantly outperform the previous a priori SNR estimation methods. Evaluating the results in [15], the RNN-assisted DD approach (a deep learing-based a priori SNR estimator) could only outperform the DD approach at higher SNR levels (5 dB and greater for signal-to-distortion ratio (SDR)). Here, the ResLSTM and ResBLSTM a priori SNR estimators significantly outperform the DD approach for all conditions.…”
Section: Accepted Manuscriptmentioning
confidence: 99%