No abstract
Voice is projected to be the next input interface for portable devices. The increased use of audio interfaces can be mainly attributed to the success of speech and speaker recognition technologies. With these advances comes the risk of criminal threats where attackers are reportedly trying to access sensitive information using diverse voice spoofing techniques. Among them, replay attacks pose a real challenge to voice biometrics. This paper addresses the problem by proposing a deep learning architecture in tandem with low-level cepstral features. We investigate the use of a deep neural network (DNN) to discriminate between the different channel conditions available in the ASVSpoof 2017 dataset, namely recording, playback and session conditions. The high-level feature vectors derived from this network are used to discriminate between genuine and spoofed audio. Two kinds of low-level features are utilized: state-ofthe-art constant-Q cepstral coefficients (CQCC), and our proposed high-frequency cepstral coefficients (HFCC) that derive from the high-frequency spectrum of the audio. The fusion of both features proved to be effective in generalizing well across diverse replay attacks seen in the evaluation of the ASVSpoof 2017 challenge, with an equal error rate of 11.5%, that is 53% better than the baseline Gaussian Mixture Model (GMM) applied on CQCC.
This paper summarizes Pindrop Labs' submission to the multitarget speaker detection and identification challenge evaluation (MCE 2018). The MCE challenge is geared towards detecting blacklisted speakers (fraudsters) in the context of call centers. Particularly, it aims to answer the following two questions: Is the speaker of the test utterance on the blacklist? If so, which speaker is it among the blacklisted speakers? While one single system can answer both questions, this work looks at them as two separate tasks: blacklist detection and closed-set identification. The former is addressed using four different systems including probabilistic linear discriminant analysis (PLDA), two deep neural network (DNN) based systems, and a simple system based on cosine similarity and logistic regression. The latter is addressed by combining PLDA and neural network based systems. The proposed system was the best performing system at the challenge on both tasks, reducing the blacklist detection error (Top-S EER) by 31.9% and the identification error (Top-1 EER) by 46.4% over the MCE baseline on the evaluation data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.