2020
DOI: 10.1558/ijsll.39778
|View full text |Cite
|
Sign up to set email alerts
|

Tuning the performance of automatic speaker recognition in different conditions

Abstract: Automatic speaker recognition applications have often been described as a ‘black box’. This study explores the benefit of tuning procedures (condition adaptation and reference normalisation) implemented in an i-vector PLDA framework ASR system, VOCALISE. These procedures enable users to open the black box to a certain degree. Subsets of two 100-speaker databases, one of Czech and the other of Persian male speakers, are used for the baseline condition and for the tuning procedures. The effect of tuning with cro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…Fabien and Motlicek [31] investigated the performance of x‐vector models in forensic scenarios, but with acted speech, and the study did not focus on the effect of multilingualism (although the dataset was multilingual). The study by Skarnitzl and colleagues [32] is specifically related to forensic research and evaluated multilingual scenarios but uses an earlier version of the VOCALIZE [33] system based on the obsolete i‐vector.…”
Section: Introductionmentioning
confidence: 99%
“…Fabien and Motlicek [31] investigated the performance of x‐vector models in forensic scenarios, but with acted speech, and the study did not focus on the effect of multilingualism (although the dataset was multilingual). The study by Skarnitzl and colleagues [32] is specifically related to forensic research and evaluated multilingual scenarios but uses an earlier version of the VOCALIZE [33] system based on the obsolete i‐vector.…”
Section: Introductionmentioning
confidence: 99%
“…Apart from the three comparisons listed above, we conducted several partial comparisons to examine the effect of "tuning" (see Skarnitzl et al, 2019) using condition adaptation. Condition adaptation optimizes the ASR system to new conditions, specific to the given recordings, by adapting the LDA and PLDA models.…”
Section: Automatic Speaker Recognition Proceduresmentioning
confidence: 99%