Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

Xia, Yangyang; Braun, Sebastian; Reddy, Chandan K.; Dubey, Harishchandra; Cutler, Ross; Tashev, Ivan

doi:10.48550/arxiv.2001.10601

Cited by 3 publications

(7 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The same model parameters were used for both the challenge 16-kHz evaluation and our own 48-kHz VCTK evaluation, demonstrating the capability to operate on speech with different bandwidths. The quality also exceeds that of the baseline [29] algorithm.…”

Section: Experiments and Resultsmentioning

confidence: 92%

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Valin¹,

Isik²,

Phansalkar³

et al. 2020

Interspeech 2020

View full text Add to dashboard Cite

Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier transform (STFT) domain, resulting in a high computational complexity. In this work, we propose PercepNet, an efficient approach that relies on human perception of speech by focusing on the spectral envelope and on the periodicity of the speech. We demonstrate high-quality, real-time enhancement of fullband (48 kHz) speech with less than 5% of a CPU core.

show abstract

Section: Experiments and Resultsmentioning

confidence: 92%

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Valin¹,

Isik²,

Phansalkar³

et al. 2020

Interspeech 2020

View full text Add to dashboard Cite

show abstract

“…The n-th echo has a delay of nτ + jitter and a gain of ρ n λ. N and ρ are chosen so that when the total delay reaches RT60, we have ρ N ≤ 1e−3. λ, τ and RT60 are sampled uniformly respectively over [0, 0.3], [10,30] ms, [0.3, 1.3] sec.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…While considering causal methods, the authors in [46] propose a convolutional recurrent network at the spectral level for real-time speech enhancement, while Xia, Yangyang, et al [30] suggest to remove the convolutional layers and apply a weighted loss function to further improve results in the real-time setup. Recently, the authors in [23] provide impressive results for both causal and non-causal models using a minimum mean-square error noise power spectral density tracker, which employs a temporal convolutional network (TCN) a priori SNR estimator.…”

Section: Related Workmentioning

confidence: 99%

Real Time Speech Enhancement in the Waveform Domain

Défossez¹,

Synnaeve²,

Adi³

2020

Preprint

View full text Add to dashboard Cite

We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

show abstract

“…2 of 20 [15]. Recent innovations have introduced transformer-based models such as DPTNet [16], enhancing context-aware modeling, alongside breakthroughs in noise suppression with NSNet [17] and NSNet2 [18]. These DNN solutions have showcased a marked improvement over their traditional counterparts, particularly in environments dominated by low-SNR non-stationary noises.…”

Section: Introductionmentioning

confidence: 99%

“…Building on the foundation laid by NSNet [17,18], this study shifts focus towards innovating training methodologies for RNN-based speech enhancement models suited for MCUs. Our research investigates training methodologies, aiming to optimize model performance from the start rather than adjusting post-training.…”

Section: Introductionmentioning

confidence: 99%

CheapSE: Improving Magnitude-Based Speech Enhancement Using Self-Reference

Dai,

Tan,

Xue

et al. 2024

Preprint

View full text Add to dashboard Cite

This study addresses the critical challenge of Speech Enhancement (SE) in noisy environments, where the deployment of Deep Neural Networksolutions on microcontrollers is hindered by their extensive computational demands. Focusing on this gap, our research introduces a novel SE method optimized for MCUs, employing a 2-layer GRU model that capitalizes on perceptual speech properties and innovative training methodologies. By incorporating self-reference signals and a dual strategy of compression and recovery based on the Mel scale, we develop an efficient model tailored for low-latency applications. Our GRU-2L-128 model demonstrates a significant reduction in size and computational requirements, achieving a 14.2× decrease in model size and a 409.1× reduction in operations compared to traditional DNN methods like DCCRN, without compromising performance. This advancement offers a promising solution for enhancing speech intelligibility in resource-constrained devices, marking a pivotal step in SE research and application.

show abstract

Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

Cited by 3 publications

References 24 publications

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Real Time Speech Enhancement in the Waveform Domain

CheapSE: Improving Magnitude-Based Speech Enhancement Using Self-Reference

Contact Info

Product

Resources

About