Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information

Sánchez, Jon; Saratxaga, Ibon; Hernáez, Inmaculada; Navas, Eva; Erro, Daniel; Raitio, Tuomo

doi:10.1109/tifs.2015.2398812

Cited by 77 publications

(41 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considering the phase information is usually neglected by many synthetic techniques, phase-based features are typically used for anti-spoofing, such as modified group delay (MGD) [6,7,8,9], cosine-normalized phase [6,9], relative phase shift (RPS) [10,11,12,13], cochlear filter cepstral coefficients plus instantaneous frequency (CFCCIF) [14]. Modulation-base features have been used in [15] to detect temporal artifacts.…”

Section: Introductionmentioning

confidence: 99%

Spoofing detection from a feature representation perspective

Tian

Xiao³

et al. 2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Spoofing detection, which discriminates the spoofed speech from the natural speech, has gained much attention recently. Lowdimensional features that are used in speaker recognition/verification are also used in spoofing detection. Unfortunately, they don't capture sufficient information required for spoofing detection. In this work, we investigate the use of high-dimensional features for spoofing detection, that maybe more sensitive to the artifacts in the spoofed speech. Six types of high-dimensional feature are employed. For each kind of feature, four different representations are extracted, i.e. the original high-dimensional feature, corresponding low-dimensional feature, the low-and the high-frequency regions of the original high-dimensional feature. Dynamic features are also calculated to assess the effectiveness of the temporal information to detect the artifacts across frames. A neural network-based classifier is adopted to handle the high-dimensional features. Experimental results on the standard ASVspoof 2015 corpus suggest that highdimensional features and dynamic features are useful for spoofing attack detection. A fusion of them has been shown to achieve 0.0% the equal error rates for nine of ten attack types.

show abstract

Section: Introductionmentioning

confidence: 99%

Spoofing detection from a feature representation perspective

Tian

Xiao³

et al. 2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Recently, with the improvement of automatic speech generation methods, speech produced by voice conversion (VC) [2] [3] and speech synthesis (SS) [4] [5] techniques has been used to attack ASV systems. Over the past few years, much research has been devoted to protect ASV systems against spoofing attack [6][7] [8].…”

Section: Introductionmentioning

confidence: 99%

DNN Filter Bank Cepstral Coefficients for Spoofing Detection

Tan

Zhang

et al. 2017

IEEE Access

View full text Add to dashboard Cite

With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof 2015 database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks. Index Termsspeaker verification, spoofing detection, DNN filter bank cepstral coefficients, filter bank neural network.

show abstract

“…The methods to detect such attacks, whether generated by voice conversion or speech synthesis algorithms, have mainly focused on the use of features such as the signal phase [8], [9], cepstral coefficients [10]- [12], pitch patterns [13], [14] or the longterm modulation spectrum [15]. There are also approaches that are based on the detection of "pop noise" [16].…”

Section: Introductionmentioning

confidence: 99%

Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification

Muckenhirn

Magimai-Doss

Marcel

2016

2016 International Conference of the Biometrics Special Interest Group (BIOSIG)

View full text Add to dashboard Cite

Abstract-In recent years, there has been a growing interest in developing countermeasures against non zero-effort attacks for speaker verification systems. Until now, the focus has been on logical access attacks, where the spoofed samples are injected into the system through a software-based process. This paper investigates a more realistic type of attack, referred to as physical access or presentation attacks, where the spoofed samples are presented as input to the microphone. To detect such attacks, we propose a binary classifier based approach that uses longterm spectral statistics as feature input. Experimental studies on the AVspoof database, which contains presentation attacks based on replay, speech synthesis and voice conversion, shows that the proposed approach can yield significantly low detection error rate with a linear classifier (half total error rate of 0.038%). Furthermore, an investigation on Interspeech 2015 ASVspoof challenge dataset shows that it is equally capable of detecting logical access attacks.

show abstract

Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information

Cited by 77 publications

References 32 publications

Spoofing detection from a feature representation perspective

Spoofing detection from a feature representation perspective

DNN Filter Bank Cepstral Coefficients for Spoofing Detection

Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification

Contact Info

Product

Resources

About