IEEE International Conference on Acoustics Speech and Signal Processing 2002
DOI: 10.1109/icassp.2002.5743814
|View full text |Cite
|
Sign up to set email alerts
|

Combining stochastic feature transformation and handset identification for telephone-based speaker verification

Abstract: The performance of telephone-based speaker verification systems can be severely degraded by the acoustic mismatch caused by telephone handsets. This paper proposes to combine a handset selector with stochastic feature transformation to reduce the mismatch. Specifically, a GMM-based handset selector is trained to identify the most likely handset used by the claimants, and then handset-specific stochastic feature transformations are applied to the distorted feature vectors. To overcome the non-linear distortion … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2002
2002
2005
2005

Publication Types

Select...
4
1

Relationship

5
0

Authors

Journals

citations
Cited by 22 publications
(46 citation statements)
references
References 10 publications
0
46
0
Order By: Relevance
“…We used a GSM speech codec to transcode the HTIMIT corpus [9] and applied the resulting transcoded speech in a speaker verification experiment similar to [10] and [11]. HTIMIT was obtained by playing a subset of the TIMIT corpus through 9 different telephone handsets and one Sennheizer head-mounted microphone.…”
Section: Speaker Verification Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…We used a GSM speech codec to transcode the HTIMIT corpus [9] and applied the resulting transcoded speech in a speaker verification experiment similar to [10] and [11]. HTIMIT was obtained by playing a subset of the TIMIT corpus through 9 different telephone handsets and one Sennheizer head-mounted microphone.…”
Section: Speaker Verification Experimentsmentioning
confidence: 99%
“…As a result, there were handset-and coder-mismatches between speaker models and verification utterances. We used stochastic feature transformation with handset identification [10][13] to compensate the mismatches. We assumed that a claimant will be asked to utter two sentences during a verification session.…”
Section: Speaker Verification Experimentsmentioning
confidence: 99%
“…In our previous work [1], a handset selector is designed to identify the most likely handset used by the claimants. The handset's identity was then used to select the transformation parameters to recover the distorted speech.…”
Section: Cluster Selectormentioning
confidence: 99%
“…with mixing coefficients ω X j , mean vectors µ X j and covariance matrices Σ X j derived from the clean speech of several speakers (ten speakers in this work), the maximum-likelihood estimates of ν can be iteratively computed via the expectation-maximization (EM) algorithm [4] as follows [1] …”
Section: Stochastic Feature Transformationmentioning
confidence: 99%
See 1 more Smart Citation