The Speaker and Language Recognition Workshop (Odyssey 2018) 2018
DOI: 10.21437/odyssey.2018-24
|View full text |Cite
|
Sign up to set email alerts
|

On the use of X-vectors for Robust Speaker Recognition

Abstract: Text-independent speaker verification (SV) is currently in the process of embracing DNN modeling in every stage of SV system. Slowly, the DNN-based approaches such as end-to-end modelling and systems based on DNN embeddings start to be competitive even in challenging and diverse channel conditions of recent NIST SREs. Domain adaptation and the need for a large amount of training data are still a challenge for current discriminative systems and (unlike with generative models), we see significant gains from data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 26 publications
(27 citation statements)
references
References 21 publications
0
27
0
Order By: Relevance
“…We show that, with such an approach, we can achieve a reasonable performance. Our results are perhaps not as competitive as those achieved with current state-of-the-art x-vector systems [18], nevertheless, we are now closer to our goal which is to further use this model in the fully end-to-end discriminative system [19] that can be initialized from a robust generative baseline. Figure 1: Scheme of an end-to-end speaker verification system based on a feed forward NN designed to mimic a generic speaker verification system ( [19]).…”
Section: Introductionmentioning
confidence: 77%
“…We show that, with such an approach, we can achieve a reasonable performance. Our results are perhaps not as competitive as those achieved with current state-of-the-art x-vector systems [18], nevertheless, we are now closer to our goal which is to further use this model in the fully end-to-end discriminative system [19] that can be initialized from a robust generative baseline. Figure 1: Scheme of an end-to-end speaker verification system based on a feed forward NN designed to mimic a generic speaker verification system ( [19]).…”
Section: Introductionmentioning
confidence: 77%
“…We are fully aware that we do not reach the performance of x-vectors. Results presented here can be directly compared to our previous work [14] focused on analyzing the performance of the state-of-the-art i-vector and x-vector systems on the very same datasets. Here we present the i-vector system that is based purely on MFCCs, while in [14] we were using concatenation of MFCCs and DNN bottleneck features.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…To mix the reverberation, noise and signal at given SNR, we followed the procedure outlined in [14]. When jointly augmenting the data by noise and reverberation, the speech and noise are reverberated separately and different RIRs from the same room are used for speech signal and noise to simulate different positions of their sources.…”
Section: Composition Of the Augmented Training Setmentioning
confidence: 99%
“…After we explore the benefits of DNN-based audio pre-processing with standard generative SV systems based on i-vectors and PLDA, we attempt to improve an already better baseline system where DNN replaces the crucial i-vector extraction step. We use the architecture proposed by David Snyder Snyder (2017), Snyder et al (2017) which already presents the x-vector (the embedding) as a robust feature for PLDA modeling, and provides state-of-the-art results across various acoustic conditions (Novotný et al, 2018b). We experiment with using the denoising autoencoder as a pre-processing step while training the x-vector extractor or just during the test stage.…”
Section: Introductionmentioning
confidence: 99%