2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288985
|View full text |Cite
|
Sign up to set email alerts
|

Comparing spectrum estimators in speaker verification under additive noise degradation

Abstract: Different short-term spectrum estimators for speaker verification under additive noise are considered. Conventionally, mel-frequency cepstral coefficients (MFCCs) are computed from discrete Fourier transform (DFT) spectra of windowed speech frames. Recently, linear prediction (LP) and its temporally weighted variants have been substituted as the spectrum analysis method in speech and speaker recognition. In this paper, 12 different short-term spectrum estimation methods are compared for speaker verification un… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 16 publications
0
12
0
Order By: Relevance
“…At feature level, [3] carried out an extensive comparison of several spectrum estimation methods under additive noise contamination and found that the best spectrum estimator was related to the noise type and level. Recent work [4,5], based on vector Taylor series (VTS) then developed using "unscented transforms" [6] tried to model non-linear distortions in the cepstral domain based on a non-linear noise model in order to relate clean and noisy cepstral coefficients and help estimate a "cleaned-up" version of i-vectors.…”
Section: Introductionmentioning
confidence: 99%
“…At feature level, [3] carried out an extensive comparison of several spectrum estimation methods under additive noise contamination and found that the best spectrum estimator was related to the noise type and level. Recent work [4,5], based on vector Taylor series (VTS) then developed using "unscented transforms" [6] tried to model non-linear distortions in the cepstral domain based on a non-linear noise model in order to relate clean and noisy cepstral coefficients and help estimate a "cleaned-up" version of i-vectors.…”
Section: Introductionmentioning
confidence: 99%
“…At a feature level, [3] carried out an extensive comparison of several spectrum estimation methods and found that the best estimator was related to the noise type and SNR level. Recent work [4,5], based on vector Taylor series (VTS) then developed using "unscented transforms" [6] tried to model non-linear distortions in the cepstral domain based on a nonlinear noise model in order to relate clean and noisy cepstral coefficients and help estimate a "cleaned-up" version of ivectors.…”
Section: Introductionmentioning
confidence: 99%
“…The resulting optimal weights for two different cases of ρ are presented where the corresponding covariance matrix R x is used in Equation 20. The AR(1) spectrum is a simple model but reasonable for speech data as speech data often are estimated as AR models (order [10][11][12][13][14][15][16][17][18][19][20]. The average damping of the different poles (ρ) of such an estimated AR spectrum from real data will give an idea of what damping factor should be chosen for the AR(1) model for the optimization of the weights.…”
Section: Variance Of the Log-spectrummentioning
confidence: 99%
“…This estimator has been evaluated and compared with the Thomson multitapers, the sinusoidal multitapers, the Welch method, and usual windowed periodogram-based cepstrum analysis methods for speaker recognition. The results of these studies show that a multitaper estimator optimal for a speech-like spectrum model has advantages compared to traditional techniques [14][15][16].…”
Section: Introductionmentioning
confidence: 99%