Interspeech 2014 2014
DOI: 10.21437/interspeech.2014-422
|View full text |Cite
|
Sign up to set email alerts
|

Nearest neighbor discriminant analysis for robust speaker recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…It is well known in the speaker recognition community that the actual distribution of i-vectors may not necessarily be Gaussian [12]. This is in particular more problematic when speech recordings are collected in the presence of noise and channel distortions [9,13]. In addition, for the NIST SRE type of scenarios, speech recordings come from various sources and collects (sometimes out-of-domain), therefore unimodality of the distributions cannot be guaranteed.…”
Section: Linear Discriminant Analysis (Lda)mentioning
confidence: 99%
See 1 more Smart Citation
“…It is well known in the speaker recognition community that the actual distribution of i-vectors may not necessarily be Gaussian [12]. This is in particular more problematic when speech recordings are collected in the presence of noise and channel distortions [9,13]. In addition, for the NIST SRE type of scenarios, speech recordings come from various sources and collects (sometimes out-of-domain), therefore unimodality of the distributions cannot be guaranteed.…”
Section: Linear Discriminant Analysis (Lda)mentioning
confidence: 99%
“…Particularly, we first describe key components that contribute to significant improvements in performance of our system. These components include: 1) a nearest-neighbor based discriminant analysis (NDA) approach [9] for channel compensation in i-vector space, which, unlike the commonly used Fisher LDA, is non-parametric and typically of full rank, 2) speaker-and channel-adapted features derived from featurespace maximum likelihood linear regression (fMLLR) transforms [10,11], which are used both to train/evaluate the DNN and to compute the sufficient Baum-Welch statistics for i-vector extraction, and 3) a DNN acoustic model with a large number of output units (∼ 10k senones) to compute the soft alignments (i.e., the posteriors). To quantify the contribution of these components, we evaluate our system in the context of speaker verification experiments using speech material from the NIST 2010 speaker recognition evaluation (SRE) which includes 5 extended core conditions involving telephone and microphone trials.…”
Section: Introductionmentioning
confidence: 99%
“…However, there are some limitations associated with the parametric LDA where the underlying distribution of classes is assumed to be Gaussian and unimodal. Nevertheless, it is well known in the speaker recognition community that the actual distribution of i-vectors may not necessarily be Gaussian [18], particularly in the presence of noise and channel distortions [15,19]. In addition, for the NIST SRE scenarios, speech recordings come from various sources (sometimes outof-domain), therefore unimodality of the distributions cannot be guaranteed.…”
Section: Nearest-neighbor Discriminant Analysis (Nda)mentioning
confidence: 99%
“…In order to alleviate some of the limitations identified for LDA, a nonparametric nearest-neighbor based discriminant analysis technique was proposed in [20], and recently evaluated for both speaker and language recognition tasks on highfrequency (HF) radio channel degraded data [15,19] where it compared favorably to LDA. In NDA, the expected values that represent the global information about each class are replaced with local sample averages computed based on the k-NN of individual samples.…”
Section: Nearest-neighbor Discriminant Analysis (Nda)mentioning
confidence: 99%
See 1 more Smart Citation