IberSPEECH 2018 2018
DOI: 10.21437/iberspeech.2018-1
|View full text |Cite
|
Sign up to set email alerts
|

Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification

Abstract: In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with similar approaches, we do not extract the embedding of an utterance from the mean reduction of the temporal dimension. Our system replaces the mean by a phrase alignment model to keep the temporal structure of each phrase which is relevant in this application since the phonetic in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(10 citation statements)
references
References 21 publications
0
10
0
Order By: Relevance
“…The corpus is divided into three speaker subset: background (bkg), development (dev), and evaluation (eval). Unlike our previous work [25,15], in this paper, we only employ the bkg data (97 speakers, 47 female/50 male) for training, and reserve the dev data to scores normalization. The evaluation part is used for enrollment and trial evaluation.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The corpus is divided into three speaker subset: background (bkg), development (dev), and evaluation (eval). Unlike our previous work [25,15], in this paper, we only employ the bkg data (97 speakers, 47 female/50 male) for training, and reserve the dev data to scores normalization. The evaluation part is used for enrollment and trial evaluation.…”
Section: Methodsmentioning
confidence: 99%
“…In the following section, we present the structure of the system used for experiments Fig.1. First, we describe the frontend based on neural networks combined with the differentiable alignment mechanism proposed in our previous work [25,15]. Then, the back-end strategies are described.…”
Section: Supervector Neural Network Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this approach does not work efficiently in text-dependent tasks since the uttered phrase is a relevant piece of information to correctly determine the identities and the system has to detect a match in the speaker and the phrase to be correct [6,9]. In our previous work [10], we have noted that part of the imprecisions may be derived from the use of the average as a representation of the utterance, and how this problem can be solved by adding a new internal layer into the deep neural network architecture which uses an alignment method to encode the temporal structure of the phrase in a supervector. In this paper, we propose a generalization for the use of different alignment mechanisms that can be employed in combination with the deep neural network to generate a differentiable supervector with good performances as will be shown in the experiments.…”
Section: Introductionmentioning
confidence: 99%
“…In our previous work [10], we explored another reason for the lack of effectiveness in these tasks. The order of the phonetic information of the uttered phrase is relevant for the identification.…”
Section: Introductionmentioning
confidence: 99%