Multiple Phase Information Combination for Replay Attacks Detection

Li, Dongbo; Wang, Longbiao; Dang, Jianwu; Liu, Meng; Oo, Zeyan; Nakagawa, Seiichi; Guan, Haotian; Li, Xiangang

doi:10.21437/interspeech.2018-2001

Cited by 25 publications

(16 citation statements)

References 22 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the present study, we extend our previous paper [31] by performing more experiments and analysis. The contributions of this study are as follows: (1) According to [31], the detection model in the previous work, which used only a training subset of the ASVspoof 2017 database, might not lead to an impressive result on the evaluation of subset-based testing; hence, we incorporate a training subset and development subset to train the model in our system. (2) In addition to the Mel-filter bank represented to scale the RP information in the previous work, we also apply a gammatone filter bank, thereby simulating the auditory system of humans to convert the important information of the full-band RP-based feature in the gammatone scale, where the RP feature with the reduced dimension is called the gammatone-scale RP.…”

Section: Introductionsupporting

confidence: 54%

“…Additionally, preliminary experiments have indicated that the Mel-scale RP provides better performance compared with the MGD cepstral coefficient (MGDCC) and MFCC, and CQCC. However, the detailed Mel-scale RP information extraction and analysis were not described in [31]. In the present study, we extend our previous paper [31] by performing more experiments and analysis.…”

Section: Introductionmentioning

confidence: 72%

“…However, the detailed Mel-scale RP information extraction and analysis were not described in [31]. In the present study, we extend our previous paper [31] by performing more experiments and analysis. The contributions of this study are as follows: (1) According to [31], the detection model in the previous work, which used only a training subset of the ASVspoof 2017 database, might not lead to an impressive result on the evaluation of subset-based testing; hence, we incorporate a training subset and development subset to train the model in our system.…”

Section: Introductionmentioning

confidence: 72%

See 2 more Smart Citations

Replay attack detection with auditory filter-based relative phase features

Wang

Phapatanaburi

et al. 2019

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

There are many studies on detecting human speech from artificially generated speech and automatic speaker verification (ASV) that aim to detect and identify whether the given speech belongs to a given speaker. Recent studies demonstrate the success of the relative phase (RP) feature in speaker recognition/verification and the detection of synthesized speech and converted speech. However, there are few studies that focus on the RP feature for replay attack detection. In this paper, we improve the discriminating ability of the RP feature by proposing two new auditory filter-based RP features for replay attack detection. The key idea is to integrate the advantage of RP-based features in signal representation with the advantage of two auditory filter-based RP features. For the first proposed feature, we apply a Mel-filter bank to convert the signal representation of conventional RP information from a linear scale to a Mel scale, where the modified representation is called the Mel-scale RP feature. For the other proposed feature, a gammatone filter bank is applied to scale the RP information, where the scaled RP feature is called the gammatone-scale RP feature. These two proposed phase-based features are implemented to achieve better performance than a conventional RP feature because of the scale resolution and. In addition to the use of individual Mel/gammatone-scale RP features, a combination of the scores of these proposed RP features and a standard magnitude-based feature, that is, the constant Q transform cepstral coefficient (CQCC), is also applied to further improve the reliable detection decision. The effectiveness of the proposed Mel-scale RP feature, gammatone-scale RP feature, and their combination are evaluated using the ASVspoof 2017 dataset. On the evaluation dataset, our proposed methods demonstrate significant improvement over the existing feature and baseline CQCC feature. The combination of the CQCC and gammatone-scale RP provides the best performance compared with an individual baseline feature and other combination methods.

show abstract

Section: Introductionsupporting

confidence: 54%

Section: Introductionmentioning

confidence: 72%

Section: Introductionmentioning

confidence: 72%

See 1 more Smart Citation

Replay attack detection with auditory filter-based relative phase features

Wang

Phapatanaburi

et al. 2019

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

show abstract

“…• Spectral centroid-based features: which include subband spectral centroid frequency coefficients and subband spectral centroid magnitude coefficients [12] and spectral centroid deviation [16]. • Phased-based features: which include instantaneous frequency cosine coefficient [44,45] and modified group delay cepstral coefficient [15]. • Zero time windowing-based features: zero time windowing cepstral coefficients [46,47].…”

Section: • Prediction Cepstral Coefficients-based Featuresmentioning

confidence: 99%

“…Since the ASVspoof 2017 challenge [1,2], more and more researchers begin to focus on playback speech detection [3][4][5][6][7][8][9][10]. Similar to many speech signal processing systems, most of all playback speech detection systems usually consist of front-end feature and back-end classifier [11][12][13][14][15][16][17][18]. For the end-to-end systems such as *Correspondence: xlt@dhu.edu.cn † Jichen Yang and Longting Xu contributed equally to this work.…”

Section: Introductionmentioning

confidence: 99%

Discriminative features based on modified log magnitude spectrum for playback speech detection

Yang

Ren³

et al. 2020

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-based octave coefficients (constant-Q mean-based octave coefficients) can be obtained by combining variance-based modified log magnitude spectrum (mean-based modified log magnitude spectrum), octave segmentation, and discrete cosine transform. Finally, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients are evaluated on ASVspoof 2017 corpus version 2.0 and ASVspoof 2019 physical access, respectively. Experimental results show that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can produce discriminative features toward playback speech. Further results on the two databases show that constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients can perform better than some common features, such as mel frequency cepstral coefficients and constant-Q cepstral coefficients.

show abstract

Introduction

Sun,

Wang

2023

SpringerBriefs in Computer Science

View full text Add to dashboard Cite

Multiple Phase Information Combination for Replay Attacks Detection

Cited by 25 publications

References 22 publications

Replay attack detection with auditory filter-based relative phase features

Replay attack detection with auditory filter-based relative phase features

Discriminative features based on modified log magnitude spectrum for playback speech detection

Introduction

Contact Info

Product

Resources

About