Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1794
|View full text |Cite
|
Sign up to set email alerts
|

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks

Abstract: We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNN-based approach to anti-spoofing. ASSERT has four components: f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 106 publications
(58 citation statements)
references
References 31 publications
0
58
0
Order By: Relevance
“…Furthermore, attentionbased models have been studied in [61,62] during the ASVspoof 2019 challenge. It is also worth noting that the best performing models on the ASVspoof challanges used fusion approaches, either at the classifier output or the feature level [57,76,15], indicating the challenges in designing a single countermeasure capable of capturing all the variabilities that may appear in wild test conditions in a presentation attack. Please refer to Table 1 for details.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, attentionbased models have been studied in [61,62] during the ASVspoof 2019 challenge. It is also worth noting that the best performing models on the ASVspoof challanges used fusion approaches, either at the classifier output or the feature level [57,76,15], indicating the challenges in designing a single countermeasure capable of capturing all the variabilities that may appear in wild test conditions in a presentation attack. Please refer to Table 1 for details.…”
Section: Related Workmentioning
confidence: 99%
“…System #1 refers to the proposed architecture that jointly optimizes SID, PAD, and ISV loss (see Figure 1a). System #2-SE is the result of applying squeeze-excitation (SE) [26] based on its recent application to PAD [9]. System #3 describes the result of assigning three max feature map (MFM) blocks [18] for SID as well as for PAD after the first three MFM blocks.…”
Section: Experimental Configurationsmentioning
confidence: 99%
“…However, SV systems are known to be vulnerable to various presentation attacks, such as replay attacks, voice conversion, and speech synthesis. These vulnerabilities have inspired research into presentation attack detection (PAD), which classifies given utterances as spoofed or not spoofed [6][7][8], where many DNN-based systems have achieved promising results [9][10][11]. Table 1 demonstrates the vulnerability of conventional SV systems when faced with presentation attacks.…”
Section: Introductionmentioning
confidence: 99%
“…Experiments were carried out using the following three CNN variants: (1) ResNet18 [22,24]; (2) SENet50 (Squeeze-Excitation Network) [24]; and (3) Light CNN (LCNN) [16]. The model parameters and architectures of ResNet18 and SENet50 are shown in Table 1.…”
Section: Experimental Settingsmentioning
confidence: 99%