Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-339
|View full text |Cite
|
Sign up to set email alerts
|

Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 14 publications
1
5
0
Order By: Relevance
“…In our work, we have proposed the extraction of i-vectors from the short-term cepstral and long-term speech features and the fusion of their cosine-distance and PLDA scores. These results have already been published in our previous works of [21,22].…”
Section: Introductionsupporting
confidence: 85%
See 1 more Smart Citation
“…In our work, we have proposed the extraction of i-vectors from the short-term cepstral and long-term speech features and the fusion of their cosine-distance and PLDA scores. These results have already been published in our previous works of [21,22].…”
Section: Introductionsupporting
confidence: 85%
“…In all of our previous works [9,10,21,22], only the static MFCCs were used. The deltas were not used in these works.…”
Section: Introductionmentioning
confidence: 99%
“…If the score is greater than a pre-defined threshold τ , x is accepted as a reference speaker's utterance; otherwise, it is rejected. The input observation x can be a raw speech waveform itself or an encoded vector using various feature extraction algorithms for speaker verification such as Mel-frequency cepstral coefficients (MFCCs) [25], i-vector [26][27][28][29], or speaker embedding vectors [5,7,8,15]. In this paper, we model the raw speech .., xe M }, and we define the score function f (·, ·) based on the cosine similarity: Fig.…”
Section: Speaker Verificationmentioning
confidence: 99%
“…The clustering assigns a label set Y = {y1, ..., yN } to X, and yi ∈ {1, ..., K}. Each observation xi of dimension D can be a speech utterance itself or an encoded vector using various feature extraction algorithms for speaker clustering such as Mel frequency cepstral coefficients (MFCCs) [18], glottal to noise excitation ratio (GNE) [8], i-vector [11,14,10,8] and MBN [7]. In this paper, we use the i-vector for the feature vector of an observation, and it can be obtained by:…”
Section: Speaker Clusteringmentioning
confidence: 99%
“…In addition, the speaker's voice identification and verification [4,5,6] are becoming attractive features for user-specific services. To provide such services, speaker clustering [7,8] plays a key role in identifying the number of speakers and grouping the utterances from the same user for the automatic user-specific model generation or speaker diarization [9,10].…”
Section: Introductionmentioning
confidence: 99%