2023
DOI: 10.1371/journal.pone.0285333
|View full text |Cite
|
Sign up to set email alerts
|

Warning: Humans cannot reliably detect speech deepfakes

Abstract: Speech deepfakes are artificial voices generated by machine learning models. Previous literature has highlighted deepfakes as one of the biggest security threats arising from progress in artificial intelligence due to their potential for misuse. However, studies investigating human detection capabilities are limited. We presented genuine and deepfake audio to n = 529 individuals and asked them to identify the deepfakes. We ran our experiments in English and Mandarin to understand if language affects detection … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…This permits for an extremely high accuracy in voice clones in a similar domain to the training data but new advancements and subtle changes in these obscure features could soon make these prediction models obsolete. Indeed, when a high-accuracy prediction model was tested on new, out-of-domain voice clones in a recent study, the prediction accuracy was abysmal (AUC is approximately 25%) [ 10 ]. We aimed to evaluate the use of perceptual features in current and future model implementations by testing model performance on a completely new generator.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…This permits for an extremely high accuracy in voice clones in a similar domain to the training data but new advancements and subtle changes in these obscure features could soon make these prediction models obsolete. Indeed, when a high-accuracy prediction model was tested on new, out-of-domain voice clones in a recent study, the prediction accuracy was abysmal (AUC is approximately 25%) [ 10 ]. We aimed to evaluate the use of perceptual features in current and future model implementations by testing model performance on a completely new generator.…”
Section: Discussionmentioning
confidence: 99%
“…For example, the previously mentioned tool that achieved 100% accuracy was trained and tested on a data set of deepfakes generated in 2019, which are of much lower quality than the level of deepfakes available in 2023 [ 8 ]. Furthermore, recent work has shown that out-of-domain voice clone detectors (ie, voice detectors applied outside of the data set in which they were applied) had extremely low performance, obtaining an area under the receiver operator curve (AUC) of 25% [ 10 ]. A more robust detection method might involve searching for the absence of biological features in the cloned voice, rather than the presence of digital features [ 11 ].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…An MIT Exploration of Generative AI • From Novel Chemicals to Opera Labeling AI-Generated Content: Promises, Perils, and Future Directions 3 (Farid 2022;Köbis, Doležalová, and Soraperra 2021;Mai et al 2023), and these issues will undoubtedly worsen as technology continues to improve and evolve (Thompson and Hsu 2023;Vynck 2023).…”
Section: Listen To This Articlementioning
confidence: 99%
“…Video, of course, relies on both audio and visual channels, and the role of audio itself should not be underestimated. For example, humans cannot reliably detect speech deepfakes [18]. The above-mentioned body of research demonstrates the need to examine dis-/ misinformation from a multimodal perspective.…”
Section: Introductionmentioning
confidence: 99%