Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2128
|View full text |Cite
|
Sign up to set email alerts
|

Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors

Abstract: The standard state-of-the-art backend for text-independent speaker recognizers that use i-vectors or x-vectors, is Gaussian PLDA (G-PLDA), assisted by a Gaussianization step involving length normalization. G-PLDA can be trained with both generative or discriminative methods. It has long been known that heavy-tailed PLDA (HT-PLDA), applied without length normalization, gives similar accuracy, but at considerable extra computational cost. We have recently introduced a fast scoring algorithm for a discriminativel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(19 citation statements)
references
References 9 publications
0
18
1
Order By: Relevance
“…There is a consistent improvement over the systems without s-norm (15)(16)(17). Fusion of these three systems (18)(19)(20) form our primary submission (system 23) to the fixed condition. We have also run a postevaluation fusion with the same systems without s-norm (15)(16)(17) which is show in row 24.…”
Section: Results and Analysismentioning
confidence: 76%
See 1 more Smart Citation
“…There is a consistent improvement over the systems without s-norm (15)(16)(17). Fusion of these three systems (18)(19)(20) form our primary submission (system 23) to the fixed condition. We have also run a postevaluation fusion with the same systems without s-norm (15)(16)(17) which is show in row 24.…”
Section: Results and Analysismentioning
confidence: 76%
“…• Training networks with 9 epochs (instead of 3 [19]. It was trained on concatenated audio files from VOXCELEB 1 and 2.Length normalization, centering, LDA, reducing dimensionality of vectors to 300, followed by another length normalization were applied to all i-vectors.…”
Section: X-vector Systemsmentioning
confidence: 99%
“…For the baseline x-vector (architecture (a)), we used the generative Heavy Tailed PLDA (HT-PLDA) classifier described in [11], as it was shown to outperform a Gaussian PLDA system. The HT-PLDA was trained using the x-vectors from the 485,385 VoxCeleb recordings that we processed by centering and whitening, but no unit-length projection was applied [10].…”
Section: Ht-plda Scoringmentioning
confidence: 99%
“…Once the DNN is trained, the embeddings are extracted for each recording and compared using a similarity metric. The metric learning process is disjoint from the DNN training and it is typically done using some variant of probabilistic linear discriminant analysis (PLDA) [8,9,10,11].…”
Section: Introductionmentioning
confidence: 99%
“…Although adversarial learning based unsupervised DA [18,19] has greatly boosted the performance of SV systems under domain mismatch scenarios, the adversarial training may lead to non-Gaussian latent vectors, which do not meet the Gaussianity requirement of the PLDA backend. This problem can be solved by using heavy-tailed PLDA [21,22] or applying ivector length normalization [23]. However, the former is more computationally expensive than the Gaussian PLDA and the latter is not really a Gaussianization procedure but a sub-optimal compromise.…”
Section: Introductionmentioning
confidence: 99%