2018
DOI: 10.1007/s00034-018-0798-4
|View full text |Cite
|
Sign up to set email alerts
|

A Conditional Generative Model for Speech Enhancement

Abstract: The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
18
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 15 publications
(18 citation statements)
references
References 23 publications
0
18
0
Order By: Relevance
“…* Correspondance email: h.phan@qmul.ac.uk and fake signals, transmits information to G so that G can learn to produce output that resembles the realistic distribution of the clean signals. Using GANs, speech enhancement has been done using either magnitude spectrum input [18] or raw waveform input [14], [15]. Existing speech enhancement GAN (SEGAN) systems share a common feature -the enhancement mapping is accomplished via a single stage by a single generator G [14], [15], [18], which may not be optimal.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…* Correspondance email: h.phan@qmul.ac.uk and fake signals, transmits information to G so that G can learn to produce output that resembles the realistic distribution of the clean signals. Using GANs, speech enhancement has been done using either magnitude spectrum input [18] or raw waveform input [14], [15]. Existing speech enhancement GAN (SEGAN) systems share a common feature -the enhancement mapping is accomplished via a single stage by a single generator G [14], [15], [18], which may not be optimal.…”
Section: Introductionmentioning
confidence: 99%
“…Using GANs, speech enhancement has been done using either magnitude spectrum input [18] or raw waveform input [14], [15]. Existing speech enhancement GAN (SEGAN) systems share a common feature -the enhancement mapping is accomplished via a single stage by a single generator G [14], [15], [18], which may not be optimal. Here, we aim to divide the enhancement process into multiple stages and accomplish it via multiple enhancement mappings, one at each stage.…”
Section: Introductionmentioning
confidence: 99%
“…Speech enhancement is useful in many applications, such as speech recognition [1,2,3] and hearing aids [4,5]. Recently, the research community has witnessed a shift in methodology from conventional signal processing methods [6,7] to data-driven enhancement approaches, particularly those based on deep learning paradigms [8,9,3,10,11]. Beside discriminative modeling with typical deep network variants, such as deep neural networks (DNNs) [8], convolutional neural networks (CNNs) [9,10], and recurrent neural networks (RNNs) [11,3], generative modeling with GANs [12] have been shown to hold promise for speech enhancement [13,14,15].…”
Section: Introductionmentioning
confidence: 99%
“…Different input types have been exploited, e.g. raw waveform [15,16] and time-frequency image [9,14]. Better losses, like Wasserstein loss [14], relativistic loss [16], and metric loss [14], have been tailored to gain stabilization in the training process.…”
Section: Introductionmentioning
confidence: 99%
“…This combines a DBF for extracting robust features with the posteriors of the DNN for improved model capability, obtaining more and better phoneme information for the TV modeling, further enhancing LID performance. These advances clearly demonstrate the relevance of phonetic-aware ASR-trained DNNs to LID, Generative adversarial networks (GANs) [10] have recently become very popular for signal generation processing in areas such as image generation [11], image-to-image translation [12,13,14] and speech enhancement [15]. A GAN consists of a generator that produces fake data from noise, and a discriminator to distinguish between fake and real data.…”
Section: Introductionmentioning
confidence: 99%