The emergence of contemporary deepfakes has attracted significant
attention in machine learning research, as artificial intelligence (AI)
generated synthetic media increases the incidence of misinterpretation
and is difficult to distinguish from genuine content. Currently, machine
learning techniques have been extensively studied for automatically
detecting deepfakes. However, human perception has been less explored.
Malicious deepfakes could ultimately cause public and social problems.
Can we humans correctly perceive the authenticity of the content of the
videos we watch? The answer is obviously uncertain; therefore, this
paper aims to evaluate the human ability to discern deepfake videos
through a subjective study. We present our findings by comparing human
observers to five state-ofthe-art audiovisual deepfake detection models.
To this end, we used gamification concepts to provide 110 participants
(55 native English speakers and 55 non-native English speakers) with a
webbased platform where they could access a series of 40 videos (20 real
and 20 fake) to determine their authenticity. Each participant performed
the experiment twice with the same 40 videos in different random orders.
The videos are manually selected from the FakeAVCeleb dataset. We found
that all AI models performed better than humans when evaluated on the
same 40 videos. The study also reveals that while deception is not
impossible, humans tend to overestimate their detection capabilities.
Our experimental results may help benchmark human versus machine
performance, advance forensics analysis, and enable adaptive
countermeasures.