2022
DOI: 10.48550/arxiv.2202.12233
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

Hemlata Tak,
Massimiliano Todisco,
Xin Wang
et al.

Abstract: The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data. With this usually being limited, current solutions typically lack generalisation to attacks encountered in the wild. Strategies to improve reliability in the face of uncontrolled, unpredictable attacks are hence needed. We report in this paper our efforts to use self-supervised learning in the form of a wav2vec 2.0 front-end with fine tuning. Despite initial base representations b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 54 publications
0
10
0
Order By: Relevance
“…The current solutions leverage end-toend deep neural networks (DNNs) [12,13], trying to distinguish artifacts and unnatural cues of spoofing speech from bona fide speech. And thanks to a series of challenges and datasets [1][2][3][4], many novel techniques were introduced to achieve promising CM performances [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28].…”
Section: Introductionmentioning
confidence: 99%
“…The current solutions leverage end-toend deep neural networks (DNNs) [12,13], trying to distinguish artifacts and unnatural cues of spoofing speech from bona fide speech. And thanks to a series of challenges and datasets [1][2][3][4], many novel techniques were introduced to achieve promising CM performances [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28].…”
Section: Introductionmentioning
confidence: 99%
“…One good example is data augmentation, a method that augments the training set using waveforms processed with codec or other signal processing operators [16,17,18]. Another approach is to replace conventional feature extractors with a pre-trained self-supervised-learning (SSL) DNN [15,19]. This DNN is pre-trained on a huge amount of diverse bona fide speech data through SSL [20], and it is able to extract features robust to various channel conditions, languages, and speakers.…”
Section: Introductionmentioning
confidence: 99%
“…More recent studies investigated the effect of selfsupervised front-ends as a speech spoofing countermeasures [22,23]. Self-supervised learning has established itself as a powerful framework to learn general data representations from unlabeled data [24,25,26].…”
Section: Introductionmentioning
confidence: 99%
“…Self-supervised learning has established itself as a powerful framework to learn general data representations from unlabeled data [24,25,26]. Tak et al [22] used wav2vec 2.0 model as front-end of existing AASIST [17] countermeasure network. They used output features of XLS-R [27] to improve the performance of AASIST.…”
Section: Introductionmentioning
confidence: 99%