2020
DOI: 10.1016/j.csl.2020.101078
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of the area under the ROC curve using neural network supervectors for text-dependent speaker verification

Abstract: This paper explores two techniques to improve the performance of text-dependent speaker verification systems based on deep neural networks. Firstly, we propose a general alignment mechanism to keep the temporal structure of each phrase and obtain a supervector with the speaker and phrase information, since both are relevant for a text-dependent verification. As we show, it is possible to use different alignment techniques to replace the global average pooling providing significant gains in performance. Moreove… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 29 publications
(29 citation statements)
references
References 51 publications
(77 reference statements)
0
29
0
Order By: Relevance
“…Unlike the previous systems, we showed in [11] that for the recognition task, direct optimization of the AUC metric is a better and more intuitive option than traditional metrics, since this metric is a measure of the system performance. Thus, we have also used this approach for the language recognition task with the differentiable approximation of the AUC function applied in [11] to find the network parameters θ.…”
Section: Triplet Neural Network Back-endmentioning
confidence: 87%
See 1 more Smart Citation
“…Unlike the previous systems, we showed in [11] that for the recognition task, direct optimization of the AUC metric is a better and more intuitive option than traditional metrics, since this metric is a measure of the system performance. Thus, we have also used this approach for the language recognition task with the differentiable approximation of the AUC function applied in [11] to find the network parameters θ.…”
Section: Triplet Neural Network Back-endmentioning
confidence: 87%
“…For training, triplet neural networks define a loss function that aims to maximize the similarity between a pair of examples belonging to the same identity or language, while minimizing the similarity with an example from another subject or language. Following the philosophy of this kind of network and applying the new loss function that we proposed in [11], we will show that the training of a system with this approach is more suitable than traditional back-ends for the language recognition task.…”
Section: Introductionmentioning
confidence: 99%
“…Results confirm that the alignment as a layer within the architecture of DNN is an interesting approach since we have obtained competitive results with a straightforward and simple alignment technique that has low computational cost, so we strongly believe that can achieve better results with more powerful techniques such as GMM or DNN posteriors. As a first approximation to the proposed future work, we have an ongoing development using GMM as a new alignment technique producing good preliminary results [29].…”
Section: Discussionmentioning
confidence: 99%
“…The AUC optimization [17] is a special case of the pAUC optimization with α = 0 and β = 1. It is known that the performance of a speaker verification system is determined on the discriminability of the difficult trials.…”
Section: Connection To Auc Maximizationmentioning
confidence: 99%
“…The state-of-the-art text-independent speaker verification systems [1][2][3][4] use deep neural networks (DNNs) to project speech recordings with different lengths into a common low dimensional embedding space where the speakers' identities are represented. Such a method is called deep embedding, where the embedding networks have three key components-network structure [1,3,[5][6][7], pooling layer [1,[8][9][10][11][12], and loss function [13][14][15][16][17]. This paper focuses on the last part, i.e., the loss functions.…”
Section: Introductionmentioning
confidence: 99%