2022
DOI: 10.48550/arxiv.2207.12308
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FAD: A Chinese Dataset for Fake Audio Detection

Abstract: Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for noisy adding at five different… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 26 publications
0
3
0
Order By: Relevance
“…Moreover, the fake audios was generated based on 11 Mandarin TTS systems and 2 Mandarin VC systems, and the duration is randomly set in the range between 2 and 10 s, with the sampling rate of 16 kHz [8]. The FAD dataset consists of 12 types mainstream voice deepfake techniques such as STRAIGHT, LPCNet, and HifiGAN to generate fake audios, and real audios from six different corpora such as AISHELL1, AISHELL3, and THCHS-30 [26]. The evaluation set of the FAD dataset contains 14,000 utterances generated by four unknown deepfake methods that were not included in the training and validation sets, which can better detect the robustness and generalization of the model in the face of unknown attacks.…”
Section: Datasets and Evaluation Metricsmentioning
confidence: 99%
“…Moreover, the fake audios was generated based on 11 Mandarin TTS systems and 2 Mandarin VC systems, and the duration is randomly set in the range between 2 and 10 s, with the sampling rate of 16 kHz [8]. The FAD dataset consists of 12 types mainstream voice deepfake techniques such as STRAIGHT, LPCNet, and HifiGAN to generate fake audios, and real audios from six different corpora such as AISHELL1, AISHELL3, and THCHS-30 [26]. The evaluation set of the FAD dataset contains 14,000 utterances generated by four unknown deepfake methods that were not included in the training and validation sets, which can better detect the robustness and generalization of the model in the face of unknown attacks.…”
Section: Datasets and Evaluation Metricsmentioning
confidence: 99%
“…2. Out-of-domain: We trained the Mandarin network with FAD (Ma et al, 2022), another Mandarin-language dataset. We used the pre-trained ASVspoof network (Delgado et al, 2021) for English-language evaluation.…”
Section: Benchmarking Against Automated Deepfake Detectorsmentioning
confidence: 99%
“…A potential solution lies in the ML-driven detection of such deepfakes using, for example, binary classifiers to discriminate between genuine/bonafide and AI-generated speech. The field has witnessed a surge in research, from the creation of extensive datasets [10,11,12,13,14,15,16] to the development of new detection models [17,18,19,20,21,22]. Most notably, initiatives such as ASVspoof [23,24,25] which were launched to benchmark competing detection solutions, seemingly show impressive progress; lower and lower state-of-the-art error rates are reported on a regular basis [21,22,26].…”
Section: Introductionmentioning
confidence: 99%