The Speaker and Language Recognition Workshop (Odyssey 2018) 2018
DOI: 10.21437/odyssey.2018-15
|View full text |Cite
|
Sign up to set email alerts
|

Spoken Language Recognition using X-vectors

Abstract: In this paper, we apply x-vectors to the task of spoken language recognition. This framework consists of a deep neural network that maps sequences of speech features to fixed-dimensional embeddings, called x-vectors. Longterm language characteristics are captured in the network by a temporal pooling layer that aggregates information across time. Once extracted, x-vectors utilize the same classification technology developed for i-vectors. In the 2017 NIST language recognition evaluation, x-vectors achieved exce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
158
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 176 publications
(159 citation statements)
references
References 17 publications
0
158
1
Order By: Relevance
“…It is known that neural network approaches are data-hungry. With data augmentation [20] and larger datasets like VoxCeleb 2 [21], neural network approaches achieve better performance than the i-vector method. Nevertheless, for applications with limited training data, i-vector warrants in-depth investigation.…”
Section: Speaker Verification On Voxcelebmentioning
confidence: 99%
“…It is known that neural network approaches are data-hungry. With data augmentation [20] and larger datasets like VoxCeleb 2 [21], neural network approaches achieve better performance than the i-vector method. Nevertheless, for applications with limited training data, i-vector warrants in-depth investigation.…”
Section: Speaker Verification On Voxcelebmentioning
confidence: 99%
“…We use the x-vector system as described in [16], [17]. The raw feature of the system is 40-dimensional filterbanks.…”
Section: A X-vector Systemmentioning
confidence: 99%
“…In summary, the PLDA model (whether Gaussian or heavytailed), provides the functional form (13), (14) and (15), for extracting Gaussian meta-embeddings from i-vectors. We shall explore both generative and discriminative methods for training the parameters of this GME extractor.…”
Section: Gme Extractor and Scoringmentioning
confidence: 99%
“…The extractor parameters, W and F, are updated by backpropagating gradients through the BXE objective, through the scoring formula (9) and the extractor formula (13) and (14). The value of ν remains fixed at the plugged in value throughout training.…”
Section: Discriminative Gme Extractor Trainingmentioning
confidence: 99%