ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746722
|View full text |Cite
|
Sign up to set email alerts
|

FastAudio: A Learnable Audio Front-End For Spoof Speech Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…As test sets of ADD 2022 include unseen genuine and fake utterances which are not present in train and development data, it is essential to develop CMs that are robust to out-of-domain data. Data augmentation strategy is efficient to improve the performance of anti-spoofing systems on cross-dataset in previous works [23][24][25][26][27][28][29][30][31][32][33]. Thus, we design low-quality data augmentation strategy to address the unseen genuine and fake utterances.…”
Section: Data Augmentationmentioning
confidence: 99%
“…As test sets of ADD 2022 include unseen genuine and fake utterances which are not present in train and development data, it is essential to develop CMs that are robust to out-of-domain data. Data augmentation strategy is efficient to improve the performance of anti-spoofing systems on cross-dataset in previous works [23][24][25][26][27][28][29][30][31][32][33]. Thus, we design low-quality data augmentation strategy to address the unseen genuine and fake utterances.…”
Section: Data Augmentationmentioning
confidence: 99%
“…For (1), parametric filters in the frequency domain need few parameters (e.g. centre frequency, bandwidth) [8,12,13,14], but computation requires the STFT and the additional design choices of window function and framing settings (which we wish to avoid). Time-domain filterbanks have become more common [5,6,15] and are highly suitable for use with CNNs [16].…”
Section: Learnable Frontendsmentioning
confidence: 99%
“…Speech filters are trained alongside the rest of the neural network to optimise the considered objective [28]. For instance, in FastAudio, triangular FBanks are initialized following the standard mel-scale, before adapting their central frequencies and frequency bands during training.…”
Section: Afes Inspired By Speech Processingmentioning
confidence: 99%