Simplification and optimization of i-vector extraction

Glembek, Ondřej; Burget, Lukáš; Matějka, Pavel; Karafiát, Martin; Kenny, Patrick

doi:10.1109/icassp.2011.5947358

Cited by 102 publications

(106 citation statements)

References 3 publications

Supporting

Mentioning

105

Contrasting

Order By: Relevance

“…The UBM and the i-vector extractor are estimated from appropriate training corpora. Methods to train the i-vector extractor and estimate the i-vectors can be found in (Dehak et al, 2011;Glembek et al, 2011).…”

Section: The I-vector Representationmentioning

confidence: 99%

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Rajan

Afanasyev

Hautamäki

et al. 2014

Digital Signal Processing

View full text Add to dashboard Cite

The availability of multiple utterances (and hence, i-vectors) for speaker enrollment brings up several alternatives for their utilization with probabilistic linear discriminant analysis (PLDA). This paper provides an overview of their effective utilization, from a practical viewpoint. We derive expressions for the evaluation of the likelihood ratio for the multi-enrollment case, with details on the computation of the required matrix inversions and determinants. The performance of five different scoring methods, and the effect of i-vector length normalization is compared experimentally. We conclude that length normalization is a useful technique for all but one of the scoring methods considered, and averaging i-vectors is the most effective out of the methods compared. We also study the application of multicondition training on the PLDA model. Our experiments indicate that multicondition training is more effective in estimating PLDA hyperparameters than it is for likelihood computation. Finally, we look at the effect of the configuration of the enrollment data on PLDA scoring, studying the properties of conditional dependence and number-of-enrollment-utterances per target speaker. Our experiments indicate that these properties affect the performance of the PLDA model. These results further support the conclusion that i-vector averaging is a simple and effective way to process multiple enrollment utterances.

show abstract

Section: The I-vector Representationmentioning

confidence: 99%

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Rajan

Afanasyev

Hautamäki

et al. 2014

Digital Signal Processing

View full text Add to dashboard Cite

show abstract

“…A standard i-vector extractor was implemented for Kaldi as well (see footnote 1 in Page 2), based on the baseline system described in [23].…”

Section: I-vector System Configurationmentioning

confidence: 99%

Intra-class covariance adaptation in PLDA back-ends for speaker verification

Madikeri

Ferras

Motlíček

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Multi-session training conditions are becoming increasingly common in recent benchmark datasets for both textindependent and text-dependent speaker verification. In the state-of-the-art i-vector framework for speaker verification, such conditions are addressed by simple techniques such as averaging the individual i-vectors, averaging scores, or modifying the Probabilistic Linear Discriminant Analysis (PLDA) scoring hypothesis for multi-session enrollment. The aforementioned techniques fail to exploit the speaker variabilities observed in the enrollment data for target speakers. In this paper, we propose to exploit the multi-session training data by estimating a speaker-dependent covariance matrix and updating the intra-speaker covariance during PLDA scoring for each target speaker. The proposed method is further extended by combining covariance adaptation and score averaging. In this method, the individual examples of the target speaker are compared against the test data as opposed to an averaged ivector, and the scores obtained are then averaged. The proposed methods are evaluated on the NIST SRE 2012 dataset. Relative improvements of up to 29% in equal error rate are obtained.

show abstract

“…Most GMM techniques use some variation of joint factor analysis (JFA) [25]. An offshoot of JFA is the i-vector technique which does away with the channel part of the model and falls back toward a PCA approach [26]. See Section 5.1 for more on the i-vector approach.…”

Section: Channel Mismatchmentioning

confidence: 99%

“…Glembek, et al [26] provide simplifications to the formulation of the i-vectors to reduce the memory usage and to increase the speed of computing the vectors. Glembek, et al [26] also explore linear transformations using principal component analysis (PCA) and Heteroscedastic Linear Discriminant Analysis 4 (HLDA) [64] to achieve orthogonality of the components of the Gaussian mixture.…”

Section: The I-vector Model (Total Variability Space)mentioning

confidence: 99%

See 1 more Smart Citation

Speaker Recognition: Advancements and Challenges

Beigi¹

2012

New Trends and Developments in Biometrics

View full text Add to dashboard Cite

Speaker Recognition is a multi-disciplinary branch of biometrics that may be used for identification, verification, and classification of individual speakers, with the capability of tracking, detection, and segmentation by extension. Recently, a comprehensive book on all aspects of speaker recognition was published [1]. Therefore, here we are not concerned with details of the standard modeling which is and has been used for the recognition task. In contrast, we present a review of the most recent literature and briefly visit the latest techniques which are being deployed in the various branches of this technology.Most of the works being reviewed here have been published in the last two years. Some of the topics, such as alternative features and modeling techniques, are general and apply to all branches of speaker recognition. Some of these general techniques, such as whispered speech, are related to the advanced treatment of special forms of audio which have not received ample attention in the past. Finally, we will follow by a look at advancements which apply to specific branches of speaker recognition [1], such as verification, identification, classification, and diarization. This chapter is meant to complement the summary of speaker recognition, presented in [2], which provided an overview of the subject. It is also intended as an update on the methods described in [1]. In the next section, for the sake of completeness, a brief history of speaker recognition is presented, followed by sections on specific progress as stated above, for globally applicable treatment and methods, as well as techniques which are related to specific branches of speaker recognition. A brief historyThe topic of speaker recognition [1] has been under development since the mid-twentieth century. The earliest known papers on the subject, published in the 1950s [3,4], were in search of finding personal traits of the speakers, by analyzing their speech, with some statistical underpinning. With the advent of early communication networks, Pollack, et al. [3] noted the need for speaker identification. Although, they employed human listeners to do the identification of individuals and studied the importance of the duration of speech and other facets that help in the recognition of a speaker. In most of the early

show abstract

Simplification and optimization of i-vector extraction

Cited by 102 publications

References 3 publications

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Intra-class covariance adaptation in PLDA back-ends for speaker verification

Speaker Recognition: Advancements and Challenges

Contact Info

Product

Resources

About