2019
DOI: 10.1121/1.5119240
|View full text |Cite
|
Sign up to set email alerts
|

On the limits of automatic speaker verification: Explaining degraded recognizer scores through acoustic changes resulting from voice disguise

Abstract: In speaker verification research, objective performance benchmarking of listeners and automatic speaker verification (ASV) systems are of key importance in understanding the limits of speaker recognition. While the adoption of common data and metrics has been instrumental to progress in ASV, there are two major shortcomings. First, the utterances lack intentional voice changes imposed by the speaker. Second, the standard evaluation metrics focus on average performance across all speakers and trials. As a resul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 43 publications
1
8
0
Order By: Relevance
“…In the present study, we analyze the speaker variation of the entire VoxCeleb1's dataset, with speech from all the 1251 speakers, 561 female and 690 male, comprising 121,350 and 168,571 same speaker trials respectively. In contrast to our previous study where speakers were asked to disguise their voices [5], the speech variations in the VoxCeleb dataset correspond to the circumstances in which they are performed -whether a live-show interview with an audience, a radio or TV program in a formal or informal atmosphere. [11].…”
Section: Experimental Setup 31 Voxceleb Corpusmentioning
confidence: 96%
See 4 more Smart Citations
“…In the present study, we analyze the speaker variation of the entire VoxCeleb1's dataset, with speech from all the 1251 speakers, 561 female and 690 male, comprising 121,350 and 168,571 same speaker trials respectively. In contrast to our previous study where speakers were asked to disguise their voices [5], the speech variations in the VoxCeleb dataset correspond to the circumstances in which they are performed -whether a live-show interview with an audience, a radio or TV program in a formal or informal atmosphere. [11].…”
Section: Experimental Setup 31 Voxceleb Corpusmentioning
confidence: 96%
“…It is therefore plausible that target speakers may get easily missed on VoxCeleb data, too. Thus, another aim of our work is to address generalizability of our earlier findings [5] (for acted voice data) to contemporary speech present in the VoxCeleb dataset.…”
Section: Introductionmentioning
confidence: 97%
See 3 more Smart Citations