Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1494
|View full text |Cite
|
Sign up to set email alerts
|

Decision-level Feature Switching as a Paradigm for Replay Attack Detection

Abstract: A pre-recorded audio sample of an authentic speaker presented to a voice-based biometric system is termed as a replay attack. Such attacks can be detected by identifying the characteristics of the recording device and environment. An analysis of different recording devices indicates that each recording device affects the spectrum differently. It is also observed that each feature captures specific characteristics of recording devices. In particular, Mel Filterbank Slope (MFS) captures low-frequency information… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…From Table 5, it can be seen that CQDC-D based system far outperforms most of existing systems for playback attack detection. However, the performance of our system based on CQDC-D is worse than the systems based on MFCC|LFS|MFS and CQCC|MFCC|LFS|MFS [13]. The reason is that the systems based on MFCC|LFS|MFS and CQCC|MFCC|LFS|MFS are based on decision-level feature switching.…”
Section: Comparison With Some Known Systemsmentioning
confidence: 78%
See 1 more Smart Citation
“…From Table 5, it can be seen that CQDC-D based system far outperforms most of existing systems for playback attack detection. However, the performance of our system based on CQDC-D is worse than the systems based on MFCC|LFS|MFS and CQCC|MFCC|LFS|MFS [13]. The reason is that the systems based on MFCC|LFS|MFS and CQCC|MFCC|LFS|MFS are based on decision-level feature switching.…”
Section: Comparison With Some Known Systemsmentioning
confidence: 78%
“…In which, CQCCE represents a combination feature by combining CQCC and log energy [8], qDFTspec represents DFT spectrum in q-log domian [11], CMPOC and CQSPIC represent constant-Q magnitude-phase octave coefficients [9] and constant-Q statistics-plus-principal information coefficients [12], respectively. In addition, MFCC represents mel-frequency cepstral coefficients, LFS and MFS represent linear filterbak slope and mel filterbank slope [13], respectively.…”
Section: Comparison With Different Dimensionsmentioning
confidence: 99%
“…• Discrete Fourier transform (DFT) based features: which include Mel frequency cepstral coefficients (MFCC) [4,13,36], mel filterbank slope [10], linear filterbak slope [10], and Q-log domain DFT-based mean normalized log spectral [42]. • Variable length energy separation algorithm (VESA)-based features: which include instantaneous frequency cosine coefficients based on VESA [6] and instantaneous amplitude cosine coefficients based on VESA [43].…”
Section: Related Workmentioning
confidence: 99%
“…Since the ASVspoof 2017 challenge [1,2], more and more researchers begin to focus on playback speech detection [3][4][5][6][7][8][9][10]. Similar to many speech signal processing systems, most of all playback speech detection systems usually consist of front-end feature and back-end classifier [11][12][13][14][15][16][17][18].…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the need to select informative feature representations arises when building a system with a growing number of features. As a consequence, choosing fewer but sufficient, complementary features is of supreme importance [10]. In this paper, we focus on resolving the two challenges mentioned above, especially the first one.…”
Section: Introductionmentioning
confidence: 99%