2015 World Congress on Information Technology and Computer Applications (WCITCA) 2015
DOI: 10.1109/wcitca.2015.7367018
|View full text |Cite
|
Sign up to set email alerts
|

An analysis and comparative evaluation of MFCC variants for speaker identification over VoIP networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 16 publications
0
8
0
Order By: Relevance
“…The emphasizing step is firstly performed using a simple first order digital filter with transfer function H(z)=1-0.95z. Next, the emphasized speech signal is blocked into Hamming-windowed frames of 25 ms (400 samples) in length with 10 ms (160 samples) overlap between any two adjacent frames [5,48,49]. As regards the GFCCs, the features are extracted using a filter bank of 64 Gammatone filters and a down sampling frequency of 100 Hz (yielding frame rate of 10 ms), as recommended by [36].…”
Section: Experiments Results and Discussion 41 The Experimental Protocolmentioning
confidence: 99%
See 2 more Smart Citations
“…The emphasizing step is firstly performed using a simple first order digital filter with transfer function H(z)=1-0.95z. Next, the emphasized speech signal is blocked into Hamming-windowed frames of 25 ms (400 samples) in length with 10 ms (160 samples) overlap between any two adjacent frames [5,48,49]. As regards the GFCCs, the features are extracted using a filter bank of 64 Gammatone filters and a down sampling frequency of 100 Hz (yielding frame rate of 10 ms), as recommended by [36].…”
Section: Experiments Results and Discussion 41 The Experimental Protocolmentioning
confidence: 99%
“…Instead of being used by themselves, the linear prediction coefficients were transformed into a set of robust, less correlated features such as, the linear prediction cepstral coefficients "LPCCs" [9], the perceptual linear prediction coefficients "PLP" [10], the perceptual linear predictive cepstral coefficients "PLPCC" [11][12][13][14][15] and the line spectral frequencies "LSF" [16] etc. In early 1980s, the so-called Mel-frequency cepstral coefficients "MFCCs" were introduced and yielded the best results compared to contemporary used features for speaker recognition [5,11]. One year later, the concept of dynamic features has been introduced to incorporate some temporal information to the extracted features [16,17].…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Some studies have realized speaker identification [20], [23], [30]- [44] by comparing the voice of a speaker with the voice of a pre-registered person. Speaker identification has been applied to video conferences [45], criminal investigations [46], and television programs [47].…”
Section: A Speaker Recognition Using Stationary Devicementioning
confidence: 99%
“…Next, the emphasized speech signal is blocked into Hamming-windowed frames of 25 ms (400 samples) in length with 10 ms (160 samples) overlap between any two adjacent frames.Finally, 19 MelFrequency Cepstral Coefficients were extracted from each frame [15]. During the training phase, a universal background model (UBM) of 1024 Gaussian components was trained on the overall training data (7.5 hours of speech) using the EM algorithm.…”
Section: The Experimental Protocolmentioning
confidence: 99%