2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639232
|View full text |Cite
|
Sign up to set email alerts
|

Adding controlled amount of noise to improve recognition of compressed and spectrally distorted speech

Abstract: This paper deals with the recognition of speech whose spectrum is notably distorted by lossy compression (namely MP3) or by some implementations of 'speech enhancement' techniques. We show that these non-linear treatments can introduce gaps in spectrum that significantly change the distribution of MFCCs and degrade performance of ASR. We propose a method that measures the level of spectrum distortion and use it for adding a controlled amount of noise to the signal. It effectively masks the gaps and helps namel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…The first MFCC coefficient was dropped, the remaining were added to the feature vector. There is no clear consensus in the literature about the ideal MFCC count, some earlier studies employed 13 MFCC components (Beltrán et al, 2015;Chu et al, 2009;Terence et al, 2013), but other works included 15 (Phan et al, 2015), 16 (Mesaros et al, 2010), 20 (Ruiz-Martinez et al, 2013), 26 (Salamon et al, 2014) and 40 (Nouza et al, 2013).…”
Section: Recognition Systemmentioning
confidence: 99%
“…The first MFCC coefficient was dropped, the remaining were added to the feature vector. There is no clear consensus in the literature about the ideal MFCC count, some earlier studies employed 13 MFCC components (Beltrán et al, 2015;Chu et al, 2009;Terence et al, 2013), but other works included 15 (Phan et al, 2015), 16 (Mesaros et al, 2010), 20 (Ruiz-Martinez et al, 2013), 26 (Salamon et al, 2014) and 40 (Nouza et al, 2013).…”
Section: Recognition Systemmentioning
confidence: 99%
“…The authors of [4], [5] theoretically studied the effect of MP3 distortion on the extracted features. The authors argued that the main problem of using the compressed data comes from "spectral holes" caused by the MP3 masking, which significantly alters the 2 nd and 3 rd derivatives of standard spectral-based features.…”
Section: Introductionmentioning
confidence: 99%
“…The authors consistently reported a significant drop in accuracy for bit-rates lower than 24 kbps, i.e., [3][4][5][6][7]. Several solutions have been proposed to improve the recognition for lower bit-rates, starting with limiting the training signal bandwidth, using perceptual linear prediction (PLP) features or adding a controlled amount of noise.…”
Section: Mp3 Speech Recognitionmentioning
confidence: 99%
“…If the standard procedure to avoid the Inf. values in logarithmic spectra is to add small amounts of uniformly distributed noise, then the addition of relatively strong noise has been shown to improve the recognition of spectrally distorted speech [7]. This technique is referenced as additional dithering later in the text.…”
Section: Robust Front-end Processingmentioning
confidence: 99%
See 1 more Smart Citation