We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
Recently, we have been investigating the application of kernel methods for fast speaker adaptation by exploiting possible non-linearity in the input speaker space. In this paper, we propose another solution based on kernelizing the eigenspace-based MLLR adaptation (EMLLR) method. We call our new method "kernel eigenspace-based MLLR adaptation" (KEMLLR). In KEMLLR, speaker-dependent (SD) models are estimated from a common speaker-independent (SI) model using MLLR adaptation, and the SD MLLR transformation matrices are mapped to a kernel-induced high-dimensional feature space, and kernel principal component analysis is used to derive a set of eigenmatrices in the feature space. In addition, composite kernel is used to preserve the row information in the transformation matrices. A new speaker's MLLR transformation matrix is then represented as a linear combination of the leading kernel eigenmatrices, which, though exists only in the feature space, still allows the speaker's mean vectors to be found explicitly. As a result, at the end of KEMLLR adaptation, a regular HMM is obtained for the new speaker and subsequent speech recognition is as fast as normal HMM decoding. KEMLLR adaptation was tested and compared with other adaptation methods (MAP, MLLR, EV, EMLLR, and eKEV) on the Resource Management and Wall Street Journal tasks using 5s or 10s of adaptation speech. It is found that in both cases, KEMLLR adaptation gives the greatest improvement over the SI model with 11-20% word error rate reduction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.