2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00112
|View full text |Cite
|
Sign up to set email alerts
|

Do Deepfakes Feel Emotions? A Semantic Approach to Detecting Deepfakes Via Emotional Inconsistencies

Abstract: Recent advances in deep learning and computer vision have spawned a new class of media forgeries known as deepfakes, which typically consist of artificially generated human faces or voices. The creation and distribution of deepfakes raise many legal and ethical concerns. As a result, the ability to distinguish between deepfakes and authentic media is vital. While deepfakes can create plausible video and audio, it may be challenging for them to to generate content that is consistent in terms of high-level seman… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
33
0
2

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(35 citation statements)
references
References 46 publications
0
33
0
2
Order By: Relevance
“…For example, in [217], the authors detected deepfake videos of humans by analyzing both audio and image sequences individually by looking for emotional inconsistencies across them. Audio and image sequences were divided into temporal segments.…”
Section: Discussion and Future Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, in [217], the authors detected deepfake videos of humans by analyzing both audio and image sequences individually by looking for emotional inconsistencies across them. Audio and image sequences were divided into temporal segments.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…For each image sequence segment, a similar LSTM using facial features was also used to predict emotions. The Lin's Concordance Correlation Coefficient [217] estimated the correlation between the video and audio and predicted inconsistencies of the emotions. We aim to apply a similar concept to image-text cross-modal forensic analysis, with an approach that exploits the alignment between object labels and text, as well as attention regions in an image (such as faces) and name entities in text to examine the consistency between image and text.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…The rapid growth of computer vision and deep learning technology has driven the recently emerged phenomena of deepfakes ( deep learning and fake ), which can automatically forge images and videos that humans cannot easily recognize [ 29 - 31 ]. In addition, deepfake techniques offer the possibility of generating unrecognizable images of a person’s face and altering or swapping a person’s face in existing images and videos with another face that exhibits the same expressions as the original face [ 29 ].…”
Section: Methodsmentioning
confidence: 99%
“…For example, it can provide privacy protection in some critical medical applications, such as face deidentification for patients [ 32 ]. Further, although deepfake technology can easily manipulate the low-level semantics of visual and audio features, a recent study suggested that it might be difficult for deepfake technology to forge the high-level semantic features of human emotions [ 31 ].…”
Section: Methodsmentioning
confidence: 99%
“…To unify measures of different sample rates, the individual sample values of valence, arousal, and performance were averaged across sequential time windows. Based on literature findings (Katsis et al, 2008 ; Hosler et al, 2021 ), a time window of 10 s was determined to be appropriate.…”
Section: Methodsmentioning
confidence: 99%