2015
DOI: 10.1121/1.4921679
|View full text |Cite
|
Sign up to set email alerts
|

Voice source characterization using pitch synchronous discrete cosine transform for speaker identification

Abstract: A characterization of the voice source (VS) signal by the pitch synchronous (PS) discrete cosine transform (DCT) is proposed. With the integrated linear prediction residual (ILPR) as the VS estimate, the PS DCT of the ILPR is evaluated as a feature vector for speaker identification (SID). On TIMIT and YOHO databases, using a Gaussian mixture model (GMM)-based classifier, it performs on par with existing VS-based features. On the NIST 2003 database, fusion with a GMM-based classifier using MFCC features improve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…CILPR is computed from the ILPR based voice source representation and captures the temporal shape of voice source signal between two GCIs. This feature has also been explored for speaker identification in [14]. ILPR is estimated by passing a non pre-emphasized version of speech signal through an LP inverse filter, the LP coefficients of the inverse filters are obtained from the corresponding pre-emphasized speech signal.…”
Section: Cilpr For Excitation Source Characterizationmentioning
confidence: 99%
“…CILPR is computed from the ILPR based voice source representation and captures the temporal shape of voice source signal between two GCIs. This feature has also been explored for speaker identification in [14]. ILPR is estimated by passing a non pre-emphasized version of speech signal through an LP inverse filter, the LP coefficients of the inverse filters are obtained from the corresponding pre-emphasized speech signal.…”
Section: Cilpr For Excitation Source Characterizationmentioning
confidence: 99%
“…Although pitch detection has been studied for decades, and a lot of achievements have obtained, it is still challenging to estimate pitch from a speech in the presence of strong noise. Actually, many applications are in a complicate and severe noise environment, so studying the pitch detection is meaningful for realisation technology of speech processing in very low signal‐to‐noise ratio (SNR), especially under the SNR of − 5 dB [1–4].…”
Section: Introductionmentioning
confidence: 99%
“…DCTILPR captures the glottal shape information of a speaker in a pitch synchronous manner. 11,12 But this does not capture the periodicity information of the signal denoting how much periodic it is. MPDSS feature is a variant of spectral flatness measure, and captures the periodicity information as the peak to dip ratio of the spectrum of a signal measures the periodicity.…”
Section: Dctilpr Featurementioning
confidence: 99%
“…11,12 An epoch extraction algorithm is applied, and using these epochs, a voiced/unvoiced decision based on maximum normalized cross-correlation is applied as in Refs. 17 and 18.…”
Section: Dctilpr Featurementioning
confidence: 99%
See 1 more Smart Citation