Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2170
|View full text |Cite
|
Sign up to set email alerts
|

The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(24 citation statements)
references
References 0 publications
0
24
0
Order By: Relevance
“…During testing, we use the CNN output activation (sigmoid activation) as our spoof detection score. Though another recent study also used VAEs for feature extraction [40], our approach is different; the authors of [40] used the latent variable from a pretrained VAE model, while we use the residual of the original and reconstructed inputs. Table 9 summarizes the results.…”
Section: Vae As a Feature Extractormentioning
confidence: 99%
“…During testing, we use the CNN output activation (sigmoid activation) as our spoof detection score. Though another recent study also used VAEs for feature extraction [40], our approach is different; the authors of [40] used the latent variable from a pretrained VAE model, while we use the residual of the original and reconstructed inputs. Table 9 summarizes the results.…”
Section: Vae As a Feature Extractormentioning
confidence: 99%
“…In this study, we use long-term CQT based log power spectrum (LPS) as input to the LCNN system similar to that in [28]. The static dimension of LPS is 84, where the number of octaves is 7 and the number of frequency bins in every octaves is 12.…”
Section: Methodsmentioning
confidence: 99%
“…Later, the constant-Q cepstral coefficients (CQCC) [14] derived from long-term constant-Q transform (CQT) emerged as a promising front-end that led to proposal of several handcrafted features along that direction [15][16][17][18]. In the recent years, robust deep learning classifiers such as squeeze excitation residual networks [19,20] and end-to-end systems with light convolutional neural networks (LCNN) [21,22] are found to be effective for detection of spoofing attacks.…”
Section: Introductionmentioning
confidence: 99%
“…We adopt the Light CNN architecture as the discriminator, which was the best system in the ASVspoof 2017 Challenge [20]. It also performed well in the ASVspoof 2019 Challenge in both replay and synthetic speech discrimination sub-tasks [21,22]. The detailed model structure is the same as that of our previous work [23].…”
Section: Synthesized Speech Discriminator Setupmentioning
confidence: 99%