2022
DOI: 10.1007/978-3-031-06433-3_19
|View full text |Cite
|
Sign up to set email alerts
|

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0
12

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(50 citation statements)
references
References 25 publications
0
38
0
12
Order By: Relevance
“…However, if the blocks of the image are shuffled in a proper way, it can lead to preserving essential characteristics while also enhancing the quality of the model [55], [56]. Additionally, several researches in [57] and [58] have demonstrated that creating patches by using characteristics gathered in an image also increases the quality of the training process. Therefore, these demonstrate that block shifting and shuffling local regions greatly raise quality when applied properly.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…However, if the blocks of the image are shuffled in a proper way, it can lead to preserving essential characteristics while also enhancing the quality of the model [55], [56]. Additionally, several researches in [57] and [58] have demonstrated that creating patches by using characteristics gathered in an image also increases the quality of the training process. Therefore, these demonstrate that block shifting and shuffling local regions greatly raise quality when applied properly.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…The further the generations go, the larger these datasets are, and the more frames they contain. Recently, with the increased focus on the concept of identifying Deepfakes ’in the wild’, a number of further important datasets have emerged, namely OpenForensics [ 26 ] which seeks to provide images containing multiple faces or crowds of people to address the problem of multi-face forgery detection as also reported in [ 27 ] and WildDeepfake [ 28 ], which aims to provide a wide variety of scenarios, situations, techniques and perturbations in the images and videos within them. Finally, a particularly large and varied dataset was presented, namely ForgeryNet [ 29 ], containing millions of images and hundreds of thousands of videos crafted with dozens of manipulation techniques, perturbations and a great variety of scenes and identities.…”
Section: Deepfake Literature Overviewmentioning
confidence: 99%
“…As the images provided in the challenge dataset are already facial cutouts of the people, this last step was not applied for this specific dataset. Inspired by previous work [ 27 ], participants proposed a mixed architecture between a Cross Vision Transformer [ 52 ] and an EfficientNets [ 53 , 54 ]. When working with Vision Transformers, the first step is to split the input image into several non-overlapping patches of equal size.…”
Section: Researcher Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Aside from using cryptographic primitives, the literature has proposed detecting deepfakes to combat them [23]. There are a range of benchmarks [9,41] and techniques proposed to detect deepfakes [7,16,40]. While promising, there are a number of challenges with these methods.…”
Section: Related Workmentioning
confidence: 99%