Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-157
|View full text |Cite
|
Sign up to set email alerts
|

Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems

Abstract: Due to within-speaker variability in phonetic content and/or speaking style, the performance of automatic speaker verification (ASV) systems degrades especially when the enrollment and test utterances are short. This study examines how different types of variability influence performance of ASV systems. Speech samples (< 2 sec) from the UCLA Speaker Variability Database containing 5 different read sentences by 200 speakers were used to study content variability. Other samples (about 5 sec) that contained speec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
15
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 20 publications
2
15
0
Order By: Relevance
“…CPP, a measure of signal periodicity, replaced the harmonicto-noise ratios. This set of features was effectively applied to automatic speaker verification [41,1]. The features were extracted every 10ms using VoiceSauce software [42].…”
Section: Voice Quality Features (Vqual)mentioning
confidence: 99%
“…CPP, a measure of signal periodicity, replaced the harmonicto-noise ratios. This set of features was effectively applied to automatic speaker verification [41,1]. The features were extracted every 10ms using VoiceSauce software [42].…”
Section: Voice Quality Features (Vqual)mentioning
confidence: 99%
“…Although within-speaker variability in phonetic content and speaking style degrades the performance of speaker verification systems for short utterances [27], due to the practical complexity of the CNN architecture and the vast number of parameters that need to be trained, it is not feasible to feed the utterances to the network since it will drastically reduce the number of samples in the training set. To compensate for this shortcoming, we propose to compute the prosodic features from the whole utterances.…”
Section: Prosodic Features To Enhance Deep Coupled Cnnmentioning
confidence: 99%
“…These voice quality parameters were utilized to represent speaker identity, and improved automatic speaker verification system performance [19,20,21]. The feature set used in this study, denoted as VQual, included F0, F1, F2, F3, harmonic amplitude differences H1-H2, H2-H4, H4-H2k, formant amplitudes A1, A2, A3, and cepstral peak prominence (CPP, [22]).…”
Section: Vqual: Voice Quality Featuresmentioning
confidence: 99%