In recent years, the performance of Automatic Speaker Verification (ASV) systems has been improved significantly. However, they are still affected by different kind of spoofing attacks. In this paper, we propose a method that fused different phase features and amplitude features to detect replay attacks. We apply the mel-scale relative phase feature and source-filter vocal tract feature in phase domain for replay attacks detection. These two phase-based features are combined to get complementary information. In addition to these phase characteristics, constant Q cepstral coefficients (CQCCs) are used. The proposed methods are evaluated using the ASVspoof 2017 challenge database, and Gaussian mixture model was used as the back-end model. The proposed approach achieved 55.6% relative error reduction rate than the conventional magnitude-based feature.
The importance of the phase information of speech signal is gathering attention. Many researches indicate system combination of the amplitude and phase features is effective for improving speaker recognition performance under noisy environments. On the other hand, speech enhancement approach is taken usually to reduce the influence of noises. However, this approach only enhances the amplitude spectrum, therefor noisy phase spectrum is used for reconstructing the estimated signal. Recent years, DNN based feature enhancement is studied intensively for robust speech processing. This approach is expected to be effective also for phase-based feature. In this paper, we propose feature space enhancement of amplitude and phase features using deep neural network (DNN) for speaker identification. We used mel-frequency cepstral coefficients as an amplitude feature, and modified group delay cepstral coefficients as a phase feature. Simultaneous enhancement of amplitude and phase based feature was effective, and it achieved about 24% relative error reduction comparing with individual feature enhancement.
There are many studies on detecting human speech from artificially generated speech and automatic speaker verification (ASV) that aim to detect and identify whether the given speech belongs to a given speaker. Recent studies demonstrate the success of the relative phase (RP) feature in speaker recognition/verification and the detection of synthesized speech and converted speech. However, there are few studies that focus on the RP feature for replay attack detection. In this paper, we improve the discriminating ability of the RP feature by proposing two new auditory filter-based RP features for replay attack detection. The key idea is to integrate the advantage of RP-based features in signal representation with the advantage of two auditory filter-based RP features. For the first proposed feature, we apply a Mel-filter bank to convert the signal representation of conventional RP information from a linear scale to a Mel scale, where the modified representation is called the Mel-scale RP feature. For the other proposed feature, a gammatone filter bank is applied to scale the RP information, where the scaled RP feature is called the gammatone-scale RP feature. These two proposed phase-based features are implemented to achieve better performance than a conventional RP feature because of the scale resolution and. In addition to the use of individual Mel/gammatone-scale RP features, a combination of the scores of these proposed RP features and a standard magnitude-based feature, that is, the constant Q transform cepstral coefficient (CQCC), is also applied to further improve the reliable detection decision. The effectiveness of the proposed Mel-scale RP feature, gammatone-scale RP feature, and their combination are evaluated using the ASVspoof 2017 dataset. On the evaluation dataset, our proposed methods demonstrate significant improvement over the existing feature and baseline CQCC feature. The combination of the CQCC and gammatone-scale RP provides the best performance compared with an individual baseline feature and other combination methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.