Proceedings of the 3rd ACM Workshop on Intelligent Cross-Data Analysis and Retrieval 2022
DOI: 10.1145/3512731.3534210
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Cheapfakes Detection by Utilizing Image Captioning for Global Context

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…Compared to the existing method for cheapfakes detection, we have proposed a method that takes advantage of attributes from the testing dataset instead of directly alternating and defines handcraft patterns based on human effort. Moreover, we have extended experiments of the same theoretical results previously described [43]. Compared to another approach, our methods achieve competitive results, which achieve equal accuracy and higher recall and F1-score.…”
Section: Discussionmentioning
confidence: 62%
“…Compared to the existing method for cheapfakes detection, we have proposed a method that takes advantage of attributes from the testing dataset instead of directly alternating and defines handcraft patterns based on human effort. Moreover, we have extended experiments of the same theoretical results previously described [43]. Compared to another approach, our methods achieve competitive results, which achieve equal accuracy and higher recall and F1-score.…”
Section: Discussionmentioning
confidence: 62%
“…On the other hand, cheap fake defines manipulated content created through more accessible methods (Paris and Donovan, 2019), e.g. changing captions or speed of videos (La et al, 2022). The term fauxtography was first coined in journalism for images manipulated to "convey a questionable (or outright false) sense of the events they seem to depict" (Cooper, 2007;Kalb and Saivetz, 2007).…”
Section: Task Formulationmentioning
confidence: 99%
“…To obtain text representations, early approaches used combinations of word2vec models (Mikolov et al, 2013), LSTMs (Hochreiter and Schmidhuber, 1997), and TF-IDF scores for n-grams (Jin et al, 2017;Tanwar and Sharma, 2020;Hou et al, 2019). More recent efforts use pretrained language models (Fung et al, 2021;Aneja et al, 2021;La et al, 2022). To encode visual data, many approaches first detect objects in visual content using a Mask R-CNN model (He et al, 2017) before extracting visual features (Aneja et al, 2021;La et al, 2022;Shang et al, 2022).…”
Section: Stage 3: Verdict Predictionmentioning
confidence: 99%
See 1 more Smart Citation
“…Visual content can be manipulated, for instance, via deepfakes [42] or by combining images from different contexts in a misleading format [43]. While multimodal fake news detection is a developing field, several approaches were already presented in [44,45].…”
mentioning
confidence: 99%