2021
DOI: 10.1109/taslp.2020.3039573
|View full text |Cite
|
Sign up to set email alerts
|

Deep Normalization for Speaker Vectors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 26 publications
(13 citation statements)
references
References 47 publications
0
10
0
3
Order By: Relevance
“…This contradiction between experimental observations and theoretical expectation deserves thoughtful investigations on PLDA. In [7][8][9], Cai et al argued that the problem should have arise from the neural speaker embeddings. It is noted that embeddings extracted from neural networks tend to be non-Gaussian for individual speakers and the distributions across different speakers are non-homogeneous.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This contradiction between experimental observations and theoretical expectation deserves thoughtful investigations on PLDA. In [7][8][9], Cai et al argued that the problem should have arise from the neural speaker embeddings. It is noted that embeddings extracted from neural networks tend to be non-Gaussian for individual speakers and the distributions across different speakers are non-homogeneous.…”
Section: Introductionmentioning
confidence: 99%
“…These irregular distributions cause the performance degradation of verification systems with the PLDA back-end. In relation to this perspective, a series of regularization approaches have been proposed to force the neural embeddings to be homogeneously Gaussian distributed, e.g., Gaussian-constrained loss [7], variational auto-encoder [8] and discriminative normalization flow [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…Probabilistic Linear Discriminant Analysis (PLDA) [3]- [6] is commonly used as a back-end scoring model. To satisfy the PLDA Gaussian assumptions for training data [7], extracted features generally require pre-processing, including Linear Discriminant Analysis (LDA) and length regularization [8], before being used to train a PLDA model. Both LDA and PLDA are trained in a supervised manner that requires training data with corresponding speaker labels.…”
Section: Introductionmentioning
confidence: 99%
“…This can be conducted either globally or class-dependently. The former is represented by the famous length normalization [16] and the latter is represented by the deep normalization model based on normalization flows [17].…”
Section: Introductionmentioning
confidence: 99%