Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413570
|View full text |Cite
|
Sign up to set email alerts
|

Emotions Don't Lie

Abstract: We present a learning-based method for detecting real and fake deepfake multimedia content. To maximize information for learning, we extract and analyze the similarity between the two audio and visual modalities from within the same video. Additionally, we extract and compare affective cues corresponding to perceived emotion from the two modalities within a video to infer whether the input video is "real" or "fake". We propose a deep learning network, inspired by the Siamese network architecture and the triple… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
31
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 181 publications
(32 citation statements)
references
References 40 publications
0
31
0
1
Order By: Relevance
“…In this study, we investigate the vulnerabilities of K-12 students, higher education students, teachers, principals, and general adult learners to deepfakes related to climate change and investigate potential population and video characteristics that can be leveraged in mitigation approaches. To date, the anticipated prevalence of deepfakes across societal contexts has motivated a large body of work seeking to develop algorithmic techniques to detect deepfakes [26][27][28][29][30][31][32][33][34][35][36][37] . However, these algorithms exhibit low rates of successful detection and are not robust across deepfake types, content format, content characteristics, and datasets 20,38 .…”
Section: Introductionmentioning
confidence: 99%
“…In this study, we investigate the vulnerabilities of K-12 students, higher education students, teachers, principals, and general adult learners to deepfakes related to climate change and investigate potential population and video characteristics that can be leveraged in mitigation approaches. To date, the anticipated prevalence of deepfakes across societal contexts has motivated a large body of work seeking to develop algorithmic techniques to detect deepfakes [26][27][28][29][30][31][32][33][34][35][36][37] . However, these algorithms exhibit low rates of successful detection and are not robust across deepfake types, content format, content characteristics, and datasets 20,38 .…”
Section: Introductionmentioning
confidence: 99%
“…A total of 2,116 teams submitted computer vision models to the competition, and the leading model achieved an accuracy score of 65% on the 4,000 videos in the holdout data, which consisted of half deepfake and half real videos ( 31 , 36 ). While there are many proposed techniques for algorithmically detecting fakes (including affective computing approaches like examining heart rate and breathing rate ( 37 ) and looking for emotion-congruent speech and facial expressions) ( 38 , 39 ), the most accurate computer vision model in the contest ( 40 ) focused on locating faces in a sample of static frames using multitask cascaded convolutional neural networks ( 41 ), conducting feature encoding based on EfficientNet B-7 ( 42 ), and training the model with a variety of transformations inspired by albumentations ( 43 ) and grid mask ( 44 ). Based on this model outperforming 2,115 other models to win significant prize money in a widely publicized competition on the largest dataset of deepfakes ever produced, we refer to this winning model as the “leading model” for detecting deepfakes to date.…”
mentioning
confidence: 99%
“…Compared with the previous work, this method considers eye blinking to detect fake videos, which is an important physical feature that can be used to distinguish the fake videos. To achieve that, this method uses a convolu- [41] developed a deep learning framework for Journal of Computer and Communications detecting deepfake in multimedia materials. The primary goal of this model is to comprehend and examine the interaction of the audio (speech) and video (visual) modalities.…”
Section: ) Biological Singles Analysismentioning
confidence: 99%