Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2648
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN

Abstract: The quality of speech synthesis systems can be significantly deteriorated by the presence of background noise in the recordings. Despite the existence of speech enhancement techniques for effectively suppressing additive noise under low signal-tonoise (SNR) conditions, these techniques have been neither designed nor tested in speech synthesis tasks where background noise has relatively lower energy. In this paper, we propose a speech enhancement technique based on generative adversarial networks (GANs) which a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(9 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…The values for batch size (4,8,12), number of flow blocks (8,12,16) and the amount of samples grouped together as input (8,12,24) were selected in a hyper-parameter search. As a loss function for training, the model minimized the negative log-likelihood of the given data.…”
Section: Training Strategymentioning
confidence: 99%
See 1 more Smart Citation
“…The values for batch size (4,8,12), number of flow blocks (8,12,16) and the amount of samples grouped together as input (8,12,24) were selected in a hyper-parameter search. As a loss function for training, the model minimized the negative log-likelihood of the given data.…”
Section: Training Strategymentioning
confidence: 99%
“…This approach has been extended multiple times, e.g. by making use of the Wasserstein distance [16] or by combining multiple generators to increase performance [17]. Others reported strong SE results working with GANs to estimate clean T-F spectrograms by implementing additional techniques like a mean squared error regularization [12] or optimizing the network directly with respect to a speech specific evaluation metric [13].…”
Section: Introductionmentioning
confidence: 99%
“…To address this problem, the Wasserstein distance [16][17][18] has been introduced to improve the conditional GAN loss, resulting in the Wasserstein GAN (WGAN) method that achieves better objective performance than SEGAN [15,19]. The WGAN method is further improved in [20] by employing metric evaluation in the conditional GAN loss, and leads to the Metric GAN method, which outperforms WGAN based methods for speech enhancement.…”
Section: Introductionmentioning
confidence: 99%
“…One approach to solve this issue is to preprocess speech data before using it for training the TTS model. [6] and [7] processed noisy speech data using speech enhancement techniques for noise-robust TTS systems. Speech enhancement research has mainly focused on removing other types of noise rather than BGM, but the primary removal target for broadcasted data is BGM.…”
Section: Introductionmentioning
confidence: 99%