Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-87
|View full text |Cite
|
Sign up to set email alerts
|

Non-linear PLDA for i-vector speaker verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
5
3
1

Relationship

4
5

Authors

Journals

citations
Cited by 36 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…On the front-end level, let us mention for example using NN bottleneck features (BNF) instead of conventional Mel Frequency Cepstral Coefficients (MFCC, Lozano-Diez et al, 2016) or simply concatenating BNF and MFCCs (Matějka et al, 2016) which greatly improves the performance and increases system robustness. Higher in the modeling pipeline, NN acoustic models can be used instead of Gaussian Mixture Models (GMM) for extraction of sufficient statistics (Lei et al, 2014) or for either complementing PLDA (Novoselov et al, 2015;Bhattacharya et al, 2016) or replacing it (Ghahabi and Hernando, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…On the front-end level, let us mention for example using NN bottleneck features (BNF) instead of conventional Mel Frequency Cepstral Coefficients (MFCC, Lozano-Diez et al, 2016) or simply concatenating BNF and MFCCs (Matějka et al, 2016) which greatly improves the performance and increases system robustness. Higher in the modeling pipeline, NN acoustic models can be used instead of Gaussian Mixture Models (GMM) for extraction of sufficient statistics (Lei et al, 2014) or for either complementing PLDA (Novoselov et al, 2015;Bhattacharya et al, 2016) or replacing it (Ghahabi and Hernando, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…Most of the attempts have replaced or improved one of the components of an i-vector + PLDA system (feature extraction, calculation of sufficient statistics, i-vector extraction or PLDA) with a neural network. As examples, let us mention: using NN bottleneck features instead of conventional MFCC features [1], NN acoustic models replacing Gaussian Mixture Models for extraction of sufficient statistics [2], NNs for either complementing PLDA [3,4] or replacing it [5]. More ambitiously, NNs that take the frame level features of an utterance as input and directly produce an utterance level representation-usually referred to as an embedding-have in the past two years almost replaced the generative i-vector approach in text independent speaker recognition [6,7,8,9,10,11,12].…”
Section: Introductionmentioning
confidence: 99%
“…This work investigates prominent techniques from speaker recognition field combined with face recognition and general deep learning science to bring new thoughts on how speaker recognition systems can be developed. I-vector-based systems are well known to be state-of-the-art solutions to the text-independent speaker verification problem [1,2,3]. The i-vector framework has inspired deep learning system design in this field.…”
Section: Introductionmentioning
confidence: 99%