Time-domain Speech Enhancement with Generative Adversarial Learning

Xiao, Feiyang; Guan, Jian; Kong, Qiuqiang; Wang, Wenwu

doi:10.48550/arxiv.2103.16149

Cited by 2 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, showed how the method may be used to regenerate whispered speech. The research [ 37 ] offers time-domain SE using GAN, an extension of the generative adversarial network in the time-domain with metric assessment to alleviate the scale issue and give model training stability, thereby improving performance. In addition, provides a novel approach based on objective function mapping to analyse Metric GAN’s performance and explain why it is superior to Wasserstein GAN.…”

Section: Related Studiesmentioning

confidence: 99%

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Ullah

Wuttisittikulkij

Chaudhary

et al. 2022

Sensors

View full text Add to dashboard Cite

Because of their simple design structure, end-to-end deep learning (E2E-DL) models have gained a lot of attention for speech enhancement. A number of DL models have achieved excellent results in eliminating the background noise and enhancing the quality as well as the intelligibility of noisy speech. Designing resource-efficient and compact models during real-time processing is still a key challenge. In order to enhance the accomplishment of E2E models, the sequential and local characteristics of speech signal should be efficiently taken into consideration while modeling. In this paper, we present resource-efficient and compact neural models for end-to-end noise-robust waveform-based speech enhancement. Combining the Convolutional Encode-Decoder (CED) and Recurrent Neural Networks (RNNs) in the Convolutional Recurrent Network (CRN) framework, we have aimed at different speech enhancement systems. Different noise types and speakers are used to train and test the proposed models. With LibriSpeech and the DEMAND dataset, the experiments show that the proposed models lead to improved quality and intelligibility with fewer trainable parameters, notably reduced model complexity, and inference time than existing recurrent and convolutional models. The quality and intelligibility are improved by 31.61% and 17.18% over the noisy speech. We further performed cross corpus analysis to demonstrate the generalization of the proposed E2E SE models across different speech datasets.

show abstract

Section: Related Studiesmentioning

confidence: 99%

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Ullah

Wuttisittikulkij

Chaudhary

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Xiao et al [13] developed time domain speech enhancement using generative adversarial network (GAN) to improve the performance of the generator and also compared difference GANs available for speech enhancement. Tan et al [14] proposed an end-to-end multi task model for VAD which increases the robustness of VAD system for low SNR conditions.…”

Section: Introductionmentioning

confidence: 99%

Short-term uncleaned signal to noise threshold ratio based end-to-end time domain speech enhancement in digital hearing aids

Nimmagadda¹,

Swamy

Prathima³

et al. 2022

IJEECS

View full text Add to dashboard Cite

This paper presents the improvements in the combined solution for the noise estimation and the speech enhancement in digital hearing aids in time domain. This study focuses on the single channel statistical temporal speech enhancement using adaptive Wiener filtering. In this technique, the noise is updated based on the short-term uncleaned signal to noise threshold ratio (ST-USNTR) of the frame. It works best if and only if the back ground noise level is low compared to that of speech of interest. We considered the time domain algorithms in order to consider the time varying nature of speech signal. The performance of the proposed algorithm is evaluated for speech signal with seven ty pes of noises and three signal to noise ratios (SNR) levels in each type of noise. From the results, it is clear that the basic level of adaptive speech enhancement is obtained using statistical parameters of noisy speech without the need for reference input.

show abstract

Time-domain Speech Enhancement with Generative Adversarial Learning

Cited by 2 publications

References 20 publications

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement

Short-term uncleaned signal to noise threshold ratio based end-to-end time domain speech enhancement in digital hearing aids

Contact Info

Product

Resources

About