The National Institute of Standards and Technology (NIST) conducts an ongoing series of Speaker Recognition Evaluations (SRE). Speaker detection performance is measured using a detection cost function defined as a weighted sum of the probabilities of type I error and of type II error. The sampling variability can result in measurement uncertainties. Thus, the uncertainties of the detection cost functions must be taken into consideration in SRE. In our prior study, the data independence was assumed while applying the nonparametric two-sample bootstrap methods based on our extensive bootstrap variability studies on large datasets to compute the standard errors (SE) of detection cost functions. In this article, the data dependency caused by multiple usages of the same subjects is taken into account. Hence, the data are grouped into target sets and non-target sets, and each set contains multiple scores. One-layer and two-layer bootstrap methods are proposed based on whether the two-sample bootstrap resampling takes place only on target sets and non-target sets, respectively, or subsequently on target scores and non-target scores within the sets. The SEs of the detection cost function using these two methods along with those with the assumption of data independency are compared. It is found that the data dependency increases both estimated SEs and the variations of SEs. Thus, in order to obtain more accurate measures in SRE, the data should be sampled randomly. Based on our research, some suggestions regarding the test design are provided.
-The nonparametric two-sample bootstrap is employed to estimate uncertainties of measures in ROC analysis on large datasets with/without data dependency due to multiple use of the same subjects in many disciplines, based on our studies of bootstrap variability. On the other hand, it would seem that the analytical approach might be used for the same purpose. The differences between these two methods are noteworthy. The bootstrap method can intrinsically take account of how genuine scores and impostor scores are distributed, deal with data dependency, and solve the issue of the covariance occurred while the statistic is a weighted sum of two probabilities derived from two sets of data, respectively, in ROC analysis. The analytical approach cannot. The analytical approach generally underestimates the uncertainties of measures as opposed to the bootstrap method. The comparison was carried out using the real data obtained from the speaker recognition evaluations and the biometric evaluations, as well as the simulated data with normal distributions and nonparametric distributions, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.