2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9206623
|View full text |Cite
|
Sign up to set email alerts
|

Mapping and Masking Targets Comparison using Different Deep Learning based Speech Enhancement Architectures

Abstract: Mapping and Masking targets are both widely used in recent Deep Neural Network (DNN) based supervised speech enhancement. Masking targets are proved to have a positive impact on the intelligibility of the output speech, while mapping targets are found, in other studies, to generate speech with better quality. However, most of the studies are based on comparing the two approaches using the Multilayer Perceptron (MLP) architecture only. With the emergence of new architectures that outperform the MLP, a more gene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(5 citation statements)
references
References 46 publications
0
5
0
Order By: Relevance
“…where, [30]. The noisy phase was stored to be added to the final estimated clean speech, assuming that the phase component is not highly affected by noise, compared to the magnitude [20].…”
Section: The Proposed Speech Enhancement Approach a Problem Definitionmentioning
confidence: 99%
“…where, [30]. The noisy phase was stored to be added to the final estimated clean speech, assuming that the phase component is not highly affected by noise, compared to the magnitude [20].…”
Section: The Proposed Speech Enhancement Approach a Problem Definitionmentioning
confidence: 99%
“…While it is referred to as classification problem if the target is to estimate a matrix, known as a mask. The mask is applied as filter to the output to produce the enhanced clean speech signal [39].…”
Section: Speech Enhancementmentioning
confidence: 99%
“…In Reference [30], an investigation is presented on the two speech enhancement learning domains, time, and frequency; while, the work in [31] explains how CNNs learn features from raw audio time series. In Reference [22], the effect of the speech enhancement training targets used for the MLP architecture was studied; and recently, this study was extended to include different architectures [32]. The use of different loss functions for the time domain approach for speech enhancement was also recently evaluated in [33].…”
Section: Problem Definition and Research Contributionmentioning
confidence: 99%
“…The magnitude power spectrum of the signal was then extracted with 256 FFT size, and the noisy phase was kept to be added to the estimated clean speech, while assuming that the phase is less affected by the noise [94]. Magnitude spectrogram mapping is the training target used in all evaluations in order to ensure the good generalization for all architecture types [32].…”
Section: Training Setupmentioning
confidence: 99%