Current state-of-the-art speaker verification (SV) systems are known to be strongly affected by unexpected variability presented during testing, such as environmental noise or changes in vocal effort. In this work, we analyze and evaluate articulatory information of the tongue's movement as a means to improve the performance of speaker verification systems. We use a Spanish database, where besides the speech signals, we also include articulatory information that was acquired with an ultrasound system. Two groups of features are proposed to represent the articulatory information, and the obtained performance is compared to an SV system trained only with acoustic information. Our results show that the proposed features contain highly discriminative information, and they are related to speaker identity; furthermore, these features can be used to complement and improve existing systems by combining such information with cepstral coefficients at the feature level.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.