An Efficient Learning Based Smartphone Playback Attack Detection Using GMM Supervector

Chün, Wang; Zou, Yuexian; Liu, Shihan; Shi, Wei; Zheng, Weiqiao

doi:10.1109/bigmm.2016.14

Cited by 4 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The earlier ones [24][25][26] were based on small-scale databases, where only a small number playback and recording conditions were taken into account. For example, in [24,27], three playback and recording devices were used to collect the database; in [25,28], one recording device and one playback device were used to create the database, which is named as authentic and playback speech database (APSD); in [29], the database was built by four smartphones; and in [26], four devices were used to create the playback utterances in the database, which is named as (audio-visual spoofing 2015) AVspoof 2015. Different from the above databases, the launch of the ASVspoof 2017 corpus provided a large common database, obtained using 26 playback devices, 25 recording devices, and 26 environments [1,2,30].…”

Section: Related Workmentioning

confidence: 99%

Discriminative features based on modified log magnitude spectrum for playback speech detection

Yang

Ren³

et al. 2020

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.

show abstract

Section: Related Workmentioning

confidence: 99%

Discriminative features based on modified log magnitude spectrum for playback speech detection

Yang

Ren³

et al. 2020

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…Although considered the easiest form of spoofing (e.g., no special expertise nor equipment is required [5]), to date only a few studies have addressed replay attacks when compared to other forms of spoofing. For example, in [6], the authors present a playback attack detector (PAD) based on a Gaussian mixture model (GMM) supervector (GSV) with a binary classifier based on a support vector machine (SVM). The authors in [7] rely on spectral bitmaps or spectral peaks, which are time-frequency points higher than a pre-defined threshold.…”

Section: Introductionmentioning

confidence: 99%

Blind Channel Response Estimation for Replay Attack Detection

Avila¹,

Alam²,

O’Shaughnessy³

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

Recently, automatic speaker verification (ASV) systems have been acknowledged to be vulnerable to replay attacks. Multiple efforts have been taken by the research community to improve ASV robustness. In this paper, we propose a replay attack countermeasure based on the blind estimation of the magnitude of channel responses. For that, the log-spectrum average of the clean speech signal is predicted from a Gaussian mixture model (GMM) of RASTA filtered mel-frequency cesptral coefficients (MFCCs) trained on clean speech. The magnitude response of the channel is obtained by subtracting the log-spectrum of the observed signal from the predicted log-spectrum average of the clean signal. Two datasets are used in our experiments: (1) the TIMIT dataset, which is used to train the log-spectrum average of the clean signal; and (2) a dataset containing replay attacks used during the second Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017). Performance is compared to two benchmarks. The discrete Fourier transform power spectral (DFTspec) and the constant Q cepstral coefficients (CQCCs). Results show the proposed method outperfoming the two benchmarks in most scenarios with equal error rate (EER) as low as 6.87% when testing on the development set and as low as 11.28% on the evaluation set.

show abstract

“…This type mainly utilizes a machine learning algorithm to learn the differences. An example is Wang et al's [14] use of a support vector machine [15] to learn the difference in Mel-frequency cepstral coefficient (MFCC)based acoustic features.…”

Section: Introductionmentioning

confidence: 99%

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

Fang

Yamagishi

Echizen

et al. 2018

2018 IEEE International Workshop on Information Forensics and Security (WIFS)

View full text Add to dashboard Cite

Automatic speaker verification (ASV) systems use a playback detector to filter out playback attacks and ensure verification reliability. Since current playback detection models are almost always trained using genuine and playedback speech, it may be possible to degrade their performance by transforming the acoustic characteristics of the played-back speech close to that of the genuine speech. One way to do this is to enhance speech "stolen" from the target speaker before playback. We tested the effectiveness of a playback attack using this method by using the speech enhancement generative adversarial network to transform acoustic characteristics. Experimental results showed that use of this "enhanced stolen speech" method significantly increases the equal error rates for the baseline used in the ASVspoof 2017 challenge and for a light convolutional neural network-based method. The results also showed that its use degrades the performance of a Gaussian mixture modeluniversal background model-based ASV system. This type of attack is thus an urgent problem needing to be solved.

show abstract

An Efficient Learning Based Smartphone Playback Attack Detection Using GMM Supervector

Cited by 4 publications

References 16 publications

Discriminative features based on modified log magnitude spectrum for playback speech detection

Discriminative features based on modified log magnitude spectrum for playback speech detection

Blind Channel Response Estimation for Replay Attack Detection

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

Contact Info

Product

Resources

About