From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Rajan, P.K.; Afanasyev, Anton; Hautamäki, Ville; Kinnunen, Tomi

doi:10.1016/j.dsp.2014.05.001

Cited by 59 publications

(31 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, each speaker with each lexicon content password is considered as a class and different phrases from the same speaker are labeled with separate classes in the PLDA model training (Larcher et al, 2014b). Since there are three target utterances for each enrollment, we used the multiple enrollment PLDA scoring approach (Rajan et al, 2014; Liu et al, 2014). Finally, we simply employed the equal weighted summation fusion approach at the score level to further enhance the performance.…”

Section: Methodsmentioning

confidence: 99%

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Kim

Lammert

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.

show abstract

Section: Methodsmentioning

confidence: 99%

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Kim

Lammert

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…The extracted i-vectors contain channel information. In order to compensate the effect of channel, probabilistic linear discriminant analysis (PLDA) is used to compute the similarity between i-vectors of enrollment and test [70]. We use Gaussian PLDA (GPLDA) in our experiment which models the within-class covariance by a full-rank matrix.…”

Section: I-vector Systemmentioning

confidence: 99%

Optimization of data-driven filterbank for automatic speaker verification

Sarangi

Sahidullah

Saha

2020

Digital Signal Processing

View full text Add to dashboard Cite

Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data. First, we introduce a frameselection based approach for developing speech-signal-based frequency warping scale. Then, we propose a new method for computing the filter frequency responses by using principal component analysis (PCA). The main advantage of the proposed method over the recently introduced deep learning based methods is that it requires very limited amount of unlabeled speech-data. We demonstrate that the proposed filterbank has more speaker discriminative power than commonly used mel filterbank as well as existing data-driven filterbank. We conduct automatic speaker verification (ASV) experiments with different corpora using various classifier back-ends. We show that the acoustic features created with proposed filterbank are better than existing mel-frequency cepstral coefficients (MFCCs) and speech-signal-based frequency cepstral coefficients (SFCCs) in most cases. In the experiments with VoxCeleb1 and popular i-vector back-end, we observe 9.75% relative improvement in equal error rate (EER) over MFCCs. Similarly, the relative improvement is 4.43% with recently introduced x-vector system. We obtain further improvement using fusion of the proposed method with standard MFCC-based approach.

show abstract

“…The 50 enrolment utterances were merged into 10 sessions (each being the concatenation of 5 utterances); either 1 or 10 of these sessions were used in enrolment, for the two enrolment scenarios. For PLDA, when using 10 enrolment sessions, ivectors were extracted from each session then averaged as suggested in [66]; for JFA, all features from all sessions 12 Available at: http://www.irisa.fr/metiss/guig/spro/ were merged. We denote the ASV systems with 5 enrolment utterances (presented as 1 session) as GMM-UBM-5, JFA-5 or PLDA-5 and those with 50 enrolment utterances (presented as 10 sessions) as GMM-UBM-50, JFA-50 or PLDA-50.…”

Section: Speaker Verification Systemsmentioning

confidence: 99%

Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance

León

Demiroğlu

et al. 2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, whilst an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.

show abstract

From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification

Cited by 59 publications

References 14 publications

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Optimization of data-driven filterbank for automatic speaker verification

Anti-Spoofing for Text-Independent Speaker Verification: An Initial Database, Comparison of Countermeasures, and Human Performance

Contact Info

Product

Resources

About