Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1760
|View full text |Cite
|
Sign up to set email alerts
|

Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features

Abstract: We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or "spoofed" (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…For the ASVspoof 2019 PA scenario, 50 systems were submitted [19]. Many countermeasures used DNNs such as the CNN, light-CNN (LCNN), and residual network (ResNet) as backend systems [20], [21], [22], [23], [24], [25], [26]. For input features, spectrogram and phase information [22], [27], linear frequency cepstral coefficients (LFCC) [18], constant Q cepstral coefficients (CQCC) [17], Mel-frequency cepstral coefficients (MFCC), inverted MFCC (IMFCC) [28], and rectangular filter cepstral coefficients (RFCC) [29] were adopted.…”
Section: Asvspoof 2019 Resultsmentioning
confidence: 99%
“…For the ASVspoof 2019 PA scenario, 50 systems were submitted [19]. Many countermeasures used DNNs such as the CNN, light-CNN (LCNN), and residual network (ResNet) as backend systems [20], [21], [22], [23], [24], [25], [26]. For input features, spectrogram and phase information [22], [27], linear frequency cepstral coefficients (LFCC) [18], constant Q cepstral coefficients (CQCC) [17], Mel-frequency cepstral coefficients (MFCC), inverted MFCC (IMFCC) [28], and rectangular filter cepstral coefficients (RFCC) [29] were adopted.…”
Section: Asvspoof 2019 Resultsmentioning
confidence: 99%
“…However, recent studies have shown that a well-trained ASV system could be deceived by malicious attacks [1][2][3]. In the last decade, the speaker verification community held several ASVspoof challenge competitions [4][5][6] to develop countermeasures mainly against replay [7,8], speech synthesis [9,10] and voice conversion [10,11] attacks.…”
Section: Introductionmentioning
confidence: 99%
“…A separate detection countermeasure has the following advantages: 1) It separates the defense part and speaker verification into two independent stages, which avoids retraining a well-developed ASV model. 2) Since most existing countermeasures for replay and synthetic speech attacks are based on a separate detection network [7][8][9], the proposed approach provides the feasibility to develop a unified countermeasure against all spoofing attacks.…”
Section: Introductionmentioning
confidence: 99%
“…In which, the corpus has three subset, train, development, and evaluation set. According to ASVspoof 2019 challenge rule, tandem detection cost function (t-DCF) [56] and EER are used as the primary and secondary metric, respectively, which is the same as the previous works [57][58][59][60][61][62][63][64]. Table 10 gives the experimental results on ASVspoof 2019 physical access development set using dynamic features of CMOC and CVOC.…”
Section: Database Introduction and Evaluation Metricmentioning
confidence: 99%