Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1991
|View full text |Cite
|
Sign up to set email alerts
|

Replay Attack Detection with Complementary High-Resolution Information Using End-to-End DNN for the ASVspoof 2019 Challenge

Abstract: In this study, we concentrate on replacing the process of extracting hand-crafted acoustic feature with end-to-end DNN using complementary high-resolution spectrograms. As a result of advance in audio devices, typical characteristics of a replayed speech based on conventional knowledge alter or diminish in unknown replay configurations. Thus, it has become increasingly difficult to detect spoofed speech with a conventional knowledge-based approach. To detect unrevealed characteristics that reside in a replayed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 35 publications
(12 citation statements)
references
References 25 publications
0
12
0
Order By: Relevance
“…For the ASVspoof 2019 PA scenario, 50 systems were submitted [19]. Many countermeasures used DNNs such as the CNN, light-CNN (LCNN), and residual network (ResNet) as backend systems [20], [21], [22], [23], [24], [25], [26]. For input features, spectrogram and phase information [22], [27], linear frequency cepstral coefficients (LFCC) [18], constant Q cepstral coefficients (CQCC) [17], Mel-frequency cepstral coefficients (MFCC), inverted MFCC (IMFCC) [28], and rectangular filter cepstral coefficients (RFCC) [29] were adopted.…”
Section: Asvspoof 2019 Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…For the ASVspoof 2019 PA scenario, 50 systems were submitted [19]. Many countermeasures used DNNs such as the CNN, light-CNN (LCNN), and residual network (ResNet) as backend systems [20], [21], [22], [23], [24], [25], [26]. For input features, spectrogram and phase information [22], [27], linear frequency cepstral coefficients (LFCC) [18], constant Q cepstral coefficients (CQCC) [17], Mel-frequency cepstral coefficients (MFCC), inverted MFCC (IMFCC) [28], and rectangular filter cepstral coefficients (RFCC) [29] were adopted.…”
Section: Asvspoof 2019 Resultsmentioning
confidence: 99%
“…A lot of countermeasures using DNN have been proposed for ASVspoof 2019 [19]. One of these countermeasures used highresolution spectrograms as input features, and CNN and gated recurrent unit (GRU) were used as a classifier, and this countermeasure was named CNN-GRU [20]. The DNN architecture of CNN-GRU is composed of convolutional layers, pooling layers, ResNet layers, and a GRU layer.…”
Section: Cnn-gru For Rad Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, SV systems are known to be vulnerable to various presentation attacks, such as replay attacks, voice conversion, and speech synthesis. These vulnerabilities have inspired research into presentation attack detection (PAD), which classifies given utterances as spoofed or not spoofed [6][7][8], where many DNN-based systems have achieved promising results [9][10][11]. Table 1 demonstrates the vulnerability of conventional SV systems when faced with presentation attacks.…”
Section: Introductionmentioning
confidence: 99%
“…Comparison of proposed system with existing systems ✔ Indicates that a particular attack is addressed and ✖ indicates that a particular attack is not addressed attacks[27]. Jung et al[46] has trained a Deep Neural Network Model with 7 spectrograms, i-vectors and raw waveforms only for replay attack detection. Table…”
mentioning
confidence: 99%