2020
DOI: 10.48550/arxiv.2008.04259
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…While their tendency to impart an unnatural frequency response to the speech of the wearer makes them unattractive as a direct source of voice recording, their isolated perspective provides incredibly potent auxiliary data points. A combination of recent work in audio de-noising, such as RNNoise [40] and PercepNet [41], with the bone conduction methods presented in this paper may enable a new generation of highly effective noise removal with minimal power requirements.…”
Section: Discussionmentioning
confidence: 99%
“…While their tendency to impart an unnatural frequency response to the speech of the wearer makes them unattractive as a direct source of voice recording, their isolated perspective provides incredibly potent auxiliary data points. A combination of recent work in audio de-noising, such as RNNoise [40] and PercepNet [41], with the bone conduction methods presented in this paper may enable a new generation of highly effective noise removal with minimal power requirements.…”
Section: Discussionmentioning
confidence: 99%
“…Higher values indicate better performance. First, we conduct extensive experiments on VoiceBank + DEMAND to compare the proposed methods with several state-of-the-art (SOTA) full-band and super-wideband SE methods, including RNNoise [3], PercepNet [4], DC-CRN [7] (super-wideband version), DeepFilterNet [5] and S-DCCRN [6]. We also conduct ablation study to show the importance of phase recovery for the low-frequency band (i.e., w/o SR-Net), which removes the complex spectral refinement in LF-Net.…”
Section: B Implementation Setupmentioning
confidence: 99%
“…In [3], Bark-scale spectrum with 22dimensional Bark-frequency cepstral coefficients (BFCC) was adopted as input features and 22 ideal critical band gains were mapped, which can reduce the model size and computational complexity simultaneously. More recently, based on the human perception of speech signals, PercepNet developed a perceptual band representation with 32 triangular spectral bands [4], spaced according to the human hearing equivalent rectangular bandwidth (ERB). Obviously, the frequency resolution of the spectrum in Bark scale and that in ERB scale are much lower than the Fourier spectrum, leading to the leakage of information among frequency bands.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, learning-based approaches have shown promising results [4,5,6]. The Deep Noise Suppression (DNS) Challenge organized at INTERSPEECH 2020 showed promising results, while also indicating that we are still about 1.4 Differential Mean Opinion Score (DMOS) from the ideal Mean Opinion Score (MOS) of 5 when tested on the DNS Challenge test set [7,8]. The DNS Challenge is the first contest that we are aware of that used subjective evaluation to benchmark SE methods using a realistic noisy test set [9].…”
Section: Introductionmentioning
confidence: 99%