Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2001
|View full text |Cite
|
Sign up to set email alerts
|

Multiple Phase Information Combination for Replay Attacks Detection

Abstract: In recent years, the performance of Automatic Speaker Verification (ASV) systems has been improved significantly. However, they are still affected by different kind of spoofing attacks. In this paper, we propose a method that fused different phase features and amplitude features to detect replay attacks. We apply the mel-scale relative phase feature and source-filter vocal tract feature in phase domain for replay attacks detection. These two phase-based features are combined to get complementary information. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(16 citation statements)
references
References 22 publications
(32 reference statements)
1
15
0
Order By: Relevance
“…In the present study, we extend our previous paper [31] by performing more experiments and analysis. The contributions of this study are as follows: (1) According to [31], the detection model in the previous work, which used only a training subset of the ASVspoof 2017 database, might not lead to an impressive result on the evaluation of subset-based testing; hence, we incorporate a training subset and development subset to train the model in our system. (2) In addition to the Mel-filter bank represented to scale the RP information in the previous work, we also apply a gammatone filter bank, thereby simulating the auditory system of humans to convert the important information of the full-band RP-based feature in the gammatone scale, where the RP feature with the reduced dimension is called the gammatone-scale RP.…”
Section: Introductionsupporting
confidence: 54%
See 2 more Smart Citations
“…In the present study, we extend our previous paper [31] by performing more experiments and analysis. The contributions of this study are as follows: (1) According to [31], the detection model in the previous work, which used only a training subset of the ASVspoof 2017 database, might not lead to an impressive result on the evaluation of subset-based testing; hence, we incorporate a training subset and development subset to train the model in our system. (2) In addition to the Mel-filter bank represented to scale the RP information in the previous work, we also apply a gammatone filter bank, thereby simulating the auditory system of humans to convert the important information of the full-band RP-based feature in the gammatone scale, where the RP feature with the reduced dimension is called the gammatone-scale RP.…”
Section: Introductionsupporting
confidence: 54%
“…Additionally, preliminary experiments have indicated that the Mel-scale RP provides better performance compared with the MGD cepstral coefficient (MGDCC) and MFCC, and CQCC. However, the detailed Mel-scale RP information extraction and analysis were not described in [31]. In the present study, we extend our previous paper [31] by performing more experiments and analysis.…”
Section: Introductionmentioning
confidence: 72%
See 1 more Smart Citation
“…• Spectral centroid-based features: which include subband spectral centroid frequency coefficients and subband spectral centroid magnitude coefficients [12] and spectral centroid deviation [16]. • Phased-based features: which include instantaneous frequency cosine coefficient [44,45] and modified group delay cepstral coefficient [15]. • Zero time windowing-based features: zero time windowing cepstral coefficients [46,47].…”
Section: • Prediction Cepstral Coefficients-based Featuresmentioning
confidence: 99%
“…Since the ASVspoof 2017 challenge [1,2], more and more researchers begin to focus on playback speech detection [3][4][5][6][7][8][9][10]. Similar to many speech signal processing systems, most of all playback speech detection systems usually consist of front-end feature and back-end classifier [11][12][13][14][15][16][17][18]. For the end-to-end systems such as *Correspondence: xlt@dhu.edu.cn † Jichen Yang and Longting Xu contributed equally to this work.…”
Section: Introductionmentioning
confidence: 99%