2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7953182
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting sequence information for text-dependent Speaker Verification

Abstract: Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-aPosteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM). While both DNNs and GMMs aim at incorporating phonetic information of the phrase with these posteriors, model-b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2017
2017
2018
2018

Publication Types

Select...
7

Relationship

5
2

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…In our previous work [12,16], we used online i-vectors as features to Dynamic Time Warping (DTW) algorithm for fixed phrase based text-dependent SV task. Significant gain in performance was observed as opposed to using the conventional i-vectors which suggests that these features contain sufficient speaker and content information.…”
Section: Content Normalization Using I-vectorsmentioning
confidence: 99%
“…In our previous work [12,16], we used online i-vectors as features to Dynamic Time Warping (DTW) algorithm for fixed phrase based text-dependent SV task. Significant gain in performance was observed as opposed to using the conventional i-vectors which suggests that these features contain sufficient speaker and content information.…”
Section: Content Normalization Using I-vectorsmentioning
confidence: 99%
“…However, a limitation of the d-Vector and triplet-loss based approaches is that they ignore the content-information of the speech signal completely. Many studies have shown that exploiting the phonetic variability can significantly enhance the performance of SV systems [5,4,13,14,15]. Motivated by these results, we incorporated this information on top of DNN based speaker embedding (from the d-Vector and tripletloss network) for the Random-digit task using content-matching [5].…”
Section: Introductionmentioning
confidence: 99%
“…Text-dependent SV can be implemented in various ways [7,6,8,9,10] (such as phrase, seen-content, random-digit, or short commands -based authentication). In this paper, we are interested in fixed-phrase and random-digit type of text-dependent SV systems.…”
Section: Introductionmentioning
confidence: 99%