LiSiam: Localization Invariance Siamese Network for Deepfake Detection

Wang, Jian; Sun, Yunlian; Tang, Jinhui

doi:10.1109/tifs.2022.3186803

Cited by 38 publications

(7 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, Wang et al [24] proposed a multi-regional attention mechanism to enhance deepfake detection performance. Additionally, vision Transformers have been employed to establish an endto-end deepfake detection framework [17], [40]. Furthermore, recent research has utilized pre-trained networks and developed Lipforensics [11] for analyzing lip prints in lip-reading tasks.…”

Section: A Deepfake Detectionmentioning

confidence: 99%

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Khormali

Yuan

2022

Applied Sciences

View full text Add to dashboard Cite

The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT’s transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT’s excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.

show abstract

Section: A Deepfake Detectionmentioning

confidence: 99%

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Khormali

Yuan

2022

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…For example, Wang et al [24] proposed a multi-regional attention mechanism to enhance deepfake detection performance. Additionally, vision Transformers have been employed to establish an end-to-end deepfake detection framework [17], [40]. Furthermore, recent research has utilized pre-trained networks and developed Lipforensics [11] for analyzing lip prints in lip-reading tasks.…”

Section: Related Work a Deepfake Detectionmentioning

confidence: 99%

Self-Supervised Graph Transformer for Deepfake Detection

Khormali,

Yuan

2024

IEEE Access

View full text Add to dashboard Cite

Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset, where training and testing take place on the in-distribution dataset. However, their performance deteriorates significantly when presented with unseen samples. As a result, a reliable deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance. Despite various attempts to enhance cross-dataset generalization, the problem remains challenging, particularly when testing against common post-processing perturbations, such as video compression or blur. Hence, this study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability, withstanding common corruptions and enabling feature explainability. The framework comprises three key components: a feature extractor based on vision Transformer architecture that is pre-trained via self-supervised contrastive learning methodology, a graph convolution network coupled with a Transformer discriminator, and a graph Transformer relevancy map that provides a better understanding of manipulated regions and further explains the model's decision. To assess the effectiveness of the proposed framework, several challenging experiments are conducted, including in-data distribution performance, cross-dataset & cross-manipulation generalization, and robustness against common post-production perturbations. The results achieved demonstrate the remarkable effectiveness of the proposed deepfake detection framework, surpassing the current state-of-theart approaches.

show abstract

“…The similar idea is adopted in RECCE [49], where only real images are reconstructed from their noisy versions. Lisiam [50] explores the robust representation by using localization invariance loss, while [51] and [52] exploit the relation between local regions to reveal the discriminative information. Additionally, RFM [53] proposes an attentionbased erasing operation to encourage the model to learn features from more potential manipulation regions.…”

Section: B Face Forgery Detection Via Representation Learningmentioning

confidence: 99%

“…It is challenging to extract representative features from degraded inputs since the forgery clues are too subtle to mine [50], [70]. In this paper, we cast the problem of detecting face forgery as a prototype learning task.…”

Section: B Fine-grained Triplet Relation Learningmentioning

confidence: 99%

“…tasks such as meta-learning [39], domain adaptation [40], and knowledge distillation [41]. With pixel-level supervision information, Lisiam [42] explores robust representation via localization invariance learning, while [43] and [44] leverage discriminative cues through local relation learning. However, pixel-level ground truth is difficult to access in real-world applications.…”

Section: B Representation Learningmentioning

confidence: 99%

See 1 more Smart Citation

Dual Attention Network Approaches to Face Forgery Video Detection

Luo

Chen

2022

IEEE Access

View full text Add to dashboard Cite

Forged videos are commonly spread online. Most have malicious content and cause serious information security problems. The most critical issue in deepfake detection is the identification of traces of tampering in fake videos. This study designs a Dual Attention Forgery Detection Network (DAFDN), which embeds a spatial reduction attention block (SRAB) and a forgery feature attention module (FFAM) to the backbone network. DAFDN embeds the two proposed attention mechanisms and enables the convolution neural network to extract peculiar traces left by images' warping. This study uses two benchmark datasets, DFDC and FaceForensics++, to compare the performance of the proposed DAFDN with other methods. The results show that the proposed DAFDN mechanism achieves AUC scores of 0.911 and 0.945 in the datasets DFDC and FaceForensics++, respectively. These results are better than those of previously developed methods, such as XceptionNet and EfficientNet-related methods.

show abstract

LiSiam: Localization Invariance Siamese Network for Deepfake Detection

Cited by 38 publications

References 45 publications

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Self-Supervised Graph Transformer for Deepfake Detection

Dual Attention Network Approaches to Face Forgery Video Detection

Contact Info

Product

Resources

About