Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1435
|View full text |Cite
|
Sign up to set email alerts
|

The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge

Abstract: In this paper, we present the DKU system for the speaker recognition task of the VOiCES from a distance challenge 2019. We investigate the whole system pipeline for the far-field speaker verification, including data pre-processing, short-term spectral feature representation, utterance-level speaker modeling, backend scoring, and score normalization. Our best single system employs a residual neural network trained with angular softmax loss. Also, the weighted prediction error algorithms can further improve perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 37 publications
0
14
0
Order By: Relevance
“…After training, the speaker embedding adopts cosine similarity for scoring. In the deep speaker embedding system with ResNet + GAP setting, a cosine similarity backend is sufficient to achieve good performance [4,22]. For training data, the original clean speech is used to train the deep speaker embedding system.…”
Section: Single-channel Training Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…After training, the speaker embedding adopts cosine similarity for scoring. In the deep speaker embedding system with ResNet + GAP setting, a cosine similarity backend is sufficient to achieve good performance [4,22]. For training data, the original clean speech is used to train the deep speaker embedding system.…”
Section: Single-channel Training Resultsmentioning
confidence: 99%
“…DNN based denoising methods for single-channel speech enhancement [10,11,12,13] and beamforming for multi-channel speech enhancement [9,14,15] have also been investigated for ASV under complex environment. At feature level, sub-band Hilbert envelopes based features [16,17,18], warped minimum variance distortionless response (MVDR) cepstral coefficients [19], blind spectral weighting (BSW) based features [20], power-normalized cepstral coefficients (PNCC) [21,22] and DNN bottleneck features [23] have been applied to ASV system to suppress the adverse impacts of reverberation and noise. At the model level, reverberation matching with multi-condition training models have been successfully employed within the universal background model (UBM) or i-vector based front-end systems [24,25].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The superiority of deep speaker embedding systems has been shown in text-independent speaker recognition for closed talking [21,22] and far-field scenarios [24,25]. In this paper, we Figure 2: Gender and age distribution adopt the deep speaker embedding system, which is initially designed for the text-independent speaker verification, as baseline.…”
Section: Model Architecturementioning
confidence: 99%
“…In backend modeling, multi-condition training of probabilistic linear discriminant analysis (PLDA) models was employed in i-vector system [25]. The robustness of deep speaker embeddings for far-field text-independent speech has also been investigated in [26,27]. Finally, at the score level, score normalization [22] and multi-channel score fusion [28] have been applied in farfield ASV system to improve the robustness.…”
Section: Introductionmentioning
confidence: 99%