Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2471
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge

Abstract: This paper is a post-evaluation analysis of our efforts in VOiCES 2019 Speaker Recognition challenge. All systems in the fixed condition are based on x-vectors with different features and DNN topologies. The single best system reaches minDCF of 0.38 (5.25% EER) and a fusion of 3 systems yields minDCF of 0.34 (4.87% EER). We also analyze how speaker verification (SV) systems evolved in last few years and show results also on SITW 2016 Challenge. EER on the core-core condition of the SITW 2016 challenge dropped … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…For this purpose, using a public recipe in Kaldi is a reasonable choice. Readers can refer to [23], [24] for SOTA performance on the same task.…”
Section: Methodsmentioning
confidence: 99%
“…For this purpose, using a public recipe in Kaldi is a reasonable choice. Readers can refer to [23], [24] for SOTA performance on the same task.…”
Section: Methodsmentioning
confidence: 99%
“…This suggests that there should be little difference between attentive For multi-head attentive pooling, the feature sequence ({hc,t} T −1 t=0 in ( 5)) in the first row corresponds to an utterance randomly selected from the VoxCeleb1 development set. For attentive STSP, the feature sequence is a random row vector in G of (12). Note that the unit in the horizontal axis is the frame index t in (5) and (9).…”
Section: A Performance On Various Evaluationsmentioning
confidence: 99%
“…Almost all systems proposed during the challenge exploited different architectures of neural networks to obtain deep speaker representations. To reduce the effects of room reverberation and various kinds of distortions, some researches use more accurate task-oriented data augmentation [15,16,17,18] and speech enhancement methods [16] based on single-channel weighted prediction error (WPE) [19].…”
Section: Speaker Embeddings For Distant Speaker Recognitionmentioning
confidence: 99%