Replay attack detection with auditory filter-based relative phase features

Oo, Zeyan; Wang, Longbiao; Phapatanaburi, Khomdet; Li, Meng; Nakagawa, Seiichi; Iwahashi, Masahiro; Dang, Jianwu

doi:10.1186/s13636-019-0151-2

Cited by 21 publications

(10 citation statements)

References 30 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By comparing the proposed LPR-RP/LPAES-RP with the LPR-MFCC, we noticed that the phase-based feature provided encouraging performance for replay attack detection. Likewise, the LFMGDCC in [41] and DCT-linear-RPS in [41], IFCC [33], mel-RP in [36], and gammatone-RP in [39] were confirmed to be efficient phase features under unseen evaluation. However, the LFMGDCC feature may lose some representation in the vocal source information of the given speech due to the enhancements of the envelope of the short-time speech spectrum, and the phase shift variation in the DCT-Linear-RPS is not normalized by cutting positions.…”

Section: B Results On the Evaluation Subsetmentioning

confidence: 91%

“…Although the RP has been successfully implemented for spoofing attack detection, its discriminating ability for replay attack detection can be further improved using a frequency resolution. In our previous work [36]- [39], mel/inverted mel-scale filterbank, linear-scale filterbank, attention-based adaptive filterbank, and gammatone-scale filterbank were applied to convert the RP information from the original linear scale to new scales (such as mel-scale, inverted mel-scale, and gammatone-scale), where the modified RP information is called mel-scale RP [37], inverted mel-scale RP (IMel-RP) [36], linear-scale RP (linear-RP) [36], adaptive scale RP (ARP) [38], gammatone-scale RP (gamatone-RP) [39], respectively. The results demonstrated that the mel-RP, ARP, and gammatone-RP outperformed the original RP feature due to the frequency resolution of the filterbank.…”

Section: Introductionmentioning

confidence: 95%

“…The results demonstrated that the mel-RP, ARP, and gammatone-RP outperformed the original RP feature due to the frequency resolution of the filterbank. Moreover, additional improvement could be obtained by combining the mel-RP/ARP/gammatone-RP with MFCC/IMFCC/CQCC at score level, especially in the combination of the gammatone-RP and CQCC yielding the best result compared with the related auditory filterbank-based RP features [36]- [39]. However, the modified phase information using the filterbank may lose some information for score combination with magnitude-based features.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features

et al. 2019

Self Cite

View full text Add to dashboard Cite

Recent studies have reported the success of linear prediction analysis (LPA)-related features, which are extracted as a short-term spectral feature for replay attack detection due to the advantage of the imperfection in the LPA-based signal produced by recording and playback devices. However, exploiting LPA-based signals is focused on only magnitude-based features and ignores phase-based features. In this paper, we propose two novel LPA-based relative phase features, namely, linear prediction residual-based relative phase (LPR-RP) and linear prediction analysis estimated speech-based relative phase (LPAES-RP).The key idea of both LPR-RP and LPAES-RP is to extract the phase information based on LPA-based signals. In the LPR-RP feature, we modify the relative phase (RP) feature extraction using a linear prediction residual (LPR) derived from the difference between the original/raw speech and LPA estimated speech signal (LPAES) instead of the original/raw speech signal. LPES-RP feature exploits the LPAES signal to replace the original/raw speech signal. Because the trace of the recording and playback device artifacts is the primary evidence for detecting the replayed signal, the advantages of the imperfection of LPR and LPAES are expected to provide efficient phase information for the replay attack detection task. In addition, using the individual LPR-RP/LPAES-RP feature, our proposed features are combined with two standard features, mel-frequency cepstral coefficients (MFCC), constant Q transform cepstral coefficients (CQCC) and the original RP feature, at score level to further improve the detection decision. The performance of the proposed LPR-RP/LPAES-RP feature and combination are evaluated using the ASVspoof 2017 version 2 database. On the evaluation subset, our proposed LPR-RP and LPAES-RP feature achieves a promising improvement over baseline features (MFCC/CQCC). Moreover, the combined systems of LPR-RP, RP, and CQCC obtains an equal error rate of 9.26%. INDEX TERMSReplay attack detection, phase information, speaker recognition anti-spoofing, linear prediction analysis-based feature.

show abstract

Section: B Results On the Evaluation Subsetmentioning

confidence: 91%

Section: Introductionmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the antispoofing competition ASV2017 [31], Witkowski, et al [32] pointed out that replay attacks can be detected by analyzing the high-frequency band of the replayed recordings. Zeyan et al [33] improved the discriminating ability of the relative phase (RP) features by proposing two new auditory filterbased RP features for replay attack detection. To detect the remote attaker, Lee et al [12] proposed a sonar-based liveness detection system.…”

Section: A Audible Attack On Vcssmentioning

confidence: 99%

EarArray: Defending against DolphinAttack via Acoustic Attenuation

Zhang

Li³

et al. 2021

Proceedings 2021 Network and Distributed System Security Symposium

View full text Add to dashboard Cite

DolphinAttacks (i.e., inaudible voice commands) modulate audible voices over ultrasounds to inject malicious commands silently into voice assistants and manipulate controlled systems (e.g., doors or smart speakers). Eliminating DolphinAttacks is challenging if ever possible since it requires to modify the microphone hardware. In this paper, we design EarArray, a lightweight method that can not only detect such attacks but also identify the direction of attackers without requiring any extra hardware or hardware modification. Essentially, inaudible voice commands are modulated on ultrasounds that inherently attenuate faster than the one of audible sounds. By inspecting the command sound signals via the built-in multiple microphones on smart devices, EarArray is able to estimate the attenuation rate and thus detect the attacks. We propose a model of the propagation of audible sounds and ultrasounds from the sound source to a voice assistant, e.g., a smart speaker, and illustrate the underlying principle and its feasibility. We implemented EarArray using two specially-designed microphone arrays and our experiments show that EarArray can detect inaudible voice commands with an accuracy of 99% and recognize the direction of the attackers with an accuracy of 97.89%.

show abstract

“…To utilize the phase information, Tom et al [18] used the group delay function (GD) in replay detection. Oo et al [19] introduced the relative phase (RP) feature and further extended it in the Mel-scale (Mel-RP) and the gammatone-scale (Gamma-RP). Phapatanaburi et al [20] proposed to extract RP based on the linear prediction analysis (LPA), which extracted RP on the residual signal of LPA.…”

Section: I R E L a T E D W O R Kmentioning

confidence: 99%

A multi-branch ResNet with discriminative features for detection of replay speech signals

Cheng

Zheng

2020

SIP

View full text Add to dashboard Cite

Nowadays, the security of ASV systems is increasingly gaining attention. As one of the common spoofing methods, replay attacks are easy to implement but difficult to detect. Many researchers focus on designing various features to detect the distortion of replay attack attempts. Constant-Q cepstral coefficients (CQCC), based on the magnitude of the constant-Q transform (CQT), is one of the striking features in the field of replay detection. However, it ignores phase information, which may also be distorted in the replay processes. In this work, we propose a CQT-based modified group delay feature (CQTMGD) which can capture the phase information of CQT. Furthermore, a multi-branch residual convolution network, ResNeWt, is proposed to distinguish replay attacks from bonafide attempts. We evaluated our proposal in the ASVspoof 2019 physical access dataset. Results show that CQTMGD outperformed the traditional MGD feature, and the fusion with other magnitude-based and phase-based features achieved a further improvement. Our best fusion system achieved 0.0096 min-tDCF and 0.39% EER on the evaluation set and it outperformed all the other state-of-the-art methods in the ASVspoof 2019 physical access challenge.

show abstract

Replay attack detection with auditory filter-based relative phase features

Cited by 21 publications

References 30 publications

Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features

Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features

EarArray: Defending against DolphinAttack via Acoustic Attenuation

A multi-branch ResNet with discriminative features for detection of replay speech signals

Contact Info

Product

Resources

About