Speech Enhancement Based on A New Architecture of Wasserstein Generative Adversarial Networks

Ye, Shuaishuai; Jiang, Ting; Qin, Shan; Zou, Weixia; Deng, Chengyun

doi:10.1109/iscslp.2018.8706647

Cited by 11 publications

(4 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, GAN models are shown to boost the generalization performance, and improve the quality of enhanced speech in the T-F domain [14,15], as shown in the speech enhancement generative adversarial network (SEGAN) where conditional GAN is used for speech enhancement. Although SEGAN achieves good performance measured in subjective metrics, the performance measured via objective metrics such as signal-tonoise ratio (SNR) tends to be degraded, which is caused by the vanishing gradient problem during training with the conditional GAN loss [14].…”

Section: Introductionmentioning

confidence: 99%

“…To address this problem, the Wasserstein distance [16][17][18] has been introduced to improve the conditional GAN loss, resulting in the Wasserstein GAN (WGAN) method that achieves better objective performance than SEGAN [15,19]. The WGAN method is further improved in [20] by employing metric evaluation in the conditional GAN loss, and leads to the Metric GAN method, which outperforms WGAN based methods for speech enhancement.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Time-domain Speech Enhancement with Generative Adversarial Learning

Xiao,

Guan,

Kong

et al. 2021

Preprint

View full text Add to dashboard Cite

Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech. Recent work has demonstrated the excellent performance of time-domain deep learning methods, such as Conv-TasNet. However, these methods can be degraded by the arbitrary scales of the waveform induced by the scale-invariant signal-to-noise ratio (SI-SNR) loss. This paper proposes a new framework called Timedomain Speech Enhancement Generative Adversarial Network (TSEGAN), which is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem, and provide model training stability, thus achieving performance improvement. In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN, and explain why it is better than the Wasserstein GAN. Experiments conducted demonstrate the effectiveness of our proposed method, and illustrate the advantage of Metric GAN.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Time-domain Speech Enhancement with Generative Adversarial Learning

Xiao,

Guan,

Kong

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Generative SE models have been accompanied by a discriminator whose task is to distinguish the original clean samples from enhanced samples. Not only does this improve the perceptual quality and intelligibility of the samples generated from the encoder-decoder generator, the addition of an adversarial model further compensates the distorted clean distributions in a generative adversarial network (GAN) SE system [14,15,16,17]. Therefore, high mean opinion scores on subjective tests can be achieved by providing more realistic and pleasant enhanced speech signals to listeners.…”

Section: Introductionmentioning

confidence: 99%

Speech Enhancement Based on Cyclegan with Noise-informed Training

Ting¹,

Wang²,

Chang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Speech enhancement (SE) approaches can be classified into supervised and unsupervised categories. For unsupervised SE, a wellknown cycle-consistent generative adversarial network (CycleGAN) model, which comprises two generators and two discriminators, has been shown to provide a powerful nonlinear mapping ability and thus achieve a promising noise-suppression capability. However, a low-efficiency training process along with insufficient knowledge between noisy and clean speech may limit the enhancement performance of the CycleGAN SE at runtime. In this study, we propose a novel noise-informed-training CycleGAN approach that incorporates additional inputs into the generators and discriminators to assist the CycleGAN in learning a more accurate transformation of speech signals between the noise and clean domains. The additional input feature serves as an indicator that provides more information during the CycleGAN training stage. Experiment results confirm that the proposed approach can improve the CycleGAN SE model while achieving a better sound quality and fewer signal distortions.

show abstract

“…SEGAN [15] is the first approach to apply GAN to SE task, which models a mapping between clean waveform and noisy waveform in an end-to-end way. Because of the unstable training process, other GAN-based systems utilize the other objective function to stabilize the training process, such as WGAN [18], SERGAN [19]. All of these GAN-based SE systems apply the U-Net architecture in the generator network from image-to-image translation [20] directly.…”

Section: Introductionmentioning

confidence: 99%

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Huang¹,

Wu²,

Huang³

et al. 2020

Preprint

View full text Add to dashboard Cite

Generative adversarial network (GAN) still exists some problems in dealing with speech enhancement (SE) task. Some GAN-based systems adopt the same structure from Pixelto-Pixel directly without special optimization. The importance of the generator network has not been fully explored. Other related researches change the generator network but operate in the timefrequency domain, which ignores the phase mismatch problem. In order to solve these problems, a deep complex convolution recurrent GAN (DCCRGAN) structure is proposed in this paper. The complex module builds the correlation between magnitude and phase of the waveform and has been proved to be effective. The proposed structure is trained in an end-to-end way. Different LSTM layers are used in the generator network to sufficiently explore the speech enhancement performance of DCCRGAN. The experimental results confirm that the proposed DCCRGAN outperforms the state-of-the-art GAN-based SE systems.

show abstract

Speech Enhancement Based on A New Architecture of Wasserstein Generative Adversarial Networks

Cited by 11 publications

References 17 publications

Time-domain Speech Enhancement with Generative Adversarial Learning

Time-domain Speech Enhancement with Generative Adversarial Learning

Speech Enhancement Based on Cyclegan with Noise-informed Training

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Contact Info

Product

Resources

About