2021
DOI: 10.3390/app11167678
|View full text |Cite
|
Sign up to set email alerts
|

High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction

Abstract: The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital media. Nowadays, most models are based on deep learning neural networks and vision transformer, SOTA model with EfficientNetB7 backbone. However, due to the usage of excessively large backbones, these models have the intrinsic dr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 35 publications
0
13
0
Order By: Relevance
“…The general architecture of the traditional transformer layer with the self-attention mechanism and the transformer layer with the re-attention mechanism are demonstrated in Figure 4. Mathematical representation of the traditional multi-head self attention layer and re-attention mechanism can be written as (7) and (8), respectively [54]. Both methods generate a trainable associate memory with a query Q and a pair of key K-value V pairs to an output via linearly transforming the input.…”
Section: Multi-stream Transformer Blockmentioning
confidence: 99%
See 3 more Smart Citations
“…The general architecture of the traditional transformer layer with the self-attention mechanism and the transformer layer with the re-attention mechanism are demonstrated in Figure 4. Mathematical representation of the traditional multi-head self attention layer and re-attention mechanism can be written as (7) and (8), respectively [54]. Both methods generate a trainable associate memory with a query Q and a pair of key K-value V pairs to an output via linearly transforming the input.…”
Section: Multi-stream Transformer Blockmentioning
confidence: 99%
“…This figure also reveals another critical point: although most deepfake detection approaches perform well on relatively more straightforward datasets, i.e., FaceForensics++, their performance is still far from perfect on more challenging and real-world datasets, i.e., WildDeepfake. [46] 99.64 F3-Net [73] 65.1 ADD-Xception [18] 79.23 RNN [74] 83.10 PPA [75] 83.1 DefakeHop [6] 90.5 FakeCatcher [15] 91.5 ATS-DE [7] 97.8 ADD-ResNet [18] 98. MesoNet-4 [34] ADDNet-3D [75] MesoNet-inception [34] XceptionNet [65] ADDNet-2D [75] ADD-Xception [55] DFDT (Ours) ACC (%)…”
Section: Intra-dataset Evaluationmentioning
confidence: 99%
See 2 more Smart Citations
“…Lately more SOTA models and methods are being developed on this topic which we haven't explored yet. In future we hope to compare and enhance our methods with the use of vision transformers and more types of visual attention mechanisms without convolutional pipeline [67,68].…”
Section: Discussionmentioning
confidence: 99%