The ASVspoof 2017 challenge is about the detection of replayed speech from human speech. The proposed system makes use of the fact that when the speech signals are replayed, they pass through multiple channels as opposed to original recordings. This channel information is typically embedded in low signal to noise ratio regions. A speech signal processing method with high spectro-temporal resolution is required to extract robust features from such regions. The single frequency filtering (SFF) is one such technique, which we propose to use for replay attack detection. While SFF based feature representation was used at front-end, Gaussian mixture model and bi-directional long short-term memory models are investigated at the backend as classifiers. The experimental results on ASVspoof 2017 dataset reveal that, SFF based representation is very effective in detecting replay attacks. The score level fusion of back end classifiers further improved the performance of the system which indicates that both classifiers capture complimentary information.
Automatic speaker verification systems are vulnerable to spoofing attacks. Recently, various countermeasures have been developed for detecting high technology attacks such as speech synthesis and voice conversion. However, there is a wide gap in dealing with replay attacks. In this paper, we propose a new feature for replay attack detection based on single frequency filtering (SFF), which provides high temporal and spectral resolution at each instant. Single frequency filtering cepstral coefficients (SFFCC) with Gaussian mixture model classifier are used for the experimentation on the standard BTAS-2016 corpus. The previously reported best result, which is based on constant Q cepstral coefficients (CQCC) achieved a half total error rate of 0.67 % on this data-set. Our proposed method outperforms the state of the art (CQCC) with a half total error rate of 0.0002 %.
The ASVspoof 2019 challenge focuses on countermeasures for all major spoofing attacks, namely speech synthesis (SS), voice conversion (VC), and replay spoofing attacks. This paper describes the IIIT-H spoofing countermeasures developed for ASVspoof 2019 challenge. In this study, three instantaneous cepstral features namely, single frequency cepstral coefficients, zero time windowing cepstral coefficients, and instantaneous frequency cepstral coefficients are used as front-end features. A Gaussian mixture model is used as back-end classifier. The experimental results on ASVspoof 2019 dataset reveal that the proposed instantaneous features are efficient in detecting VC and SS based attacks. In detecting replay attacks, proposed features are comparable with baseline systems. Further analysis is carried out using metadata to assess the impact of proposed countermeasures on different synthetic speech generating algorithm/replay configurations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.