The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital media. Nowadays, most models are based on deep learning neural networks and vision transformer, SOTA model with EfficientNetB7 backbone. However, due to the usage of excessively large backbones, these models have the intrinsic drawback of being too heavy. In our research, a high performance DeepFake detection model for manipulated video is proposed, ensuring accuracy of the model while keeping an appropriate weight. We inherited content from previous research projects related to distillation methodology but our proposal approached in a different way with manual distillation extraction, target-specific regions extraction, data augmentation, frame and multi-region ensemble, along with suggesting a CNN-based model as well as flexible classification with a dynamic threshold. Our proposal can reduce the overfitting problem, a common and particularly important problem affecting the quality of many models. So as to analyze the quality of our model, we performed tests on two datasets. DeepFake Detection Dataset (DFDC) with our model obtains 0.958 of AUC and 0.9243 of F1-score, compared with the SOTA model which obtains 0.972 of AUC and 0.906 of F1-score, and the smaller dataset Celeb-DF v2 with 0.978 of AUC and 0.9628 of F1-score.
Face forgery generating algorithms that produce a range of manipulated videos/images have developed quickly. Consequently, this causes an increase in the production of fake information, making it difficult to identify. Because facial manipulation technologies raise severe concerns, face forgery detection is gaining increasing attention in the area of computer vision. In real-world applications, face forgery detection systems frequently encounter and perform poorly in unseen domains, due to poor generalization. In this paper, we propose a deepfake detection method based on meta-learning called Meta Deepfake Detection (MDD). The goal of the model is to develop a generalized model capable of directly solving new unseen domains without the need for model updates. The MDD algorithm establishes various weights for facial images from various domains. Specifically, MDD uses meta-weight learning to shift information from the source domains to the target domains with meta-optimization steps, which aims for the model to generate effective representations of the source and target domains. We build multi-domain sets using meta splitting strategy to create a meta-train set and meta-test set. Based on these sets, the model determines the gradient descent and obtains backpropagation. The inner and outer loop gradients were aggregated to update the model to enhance generalization. By introducing pair-attention loss and average-center alignment loss, the detection capabilities of the system were substantially enhanced. In addition, we used some evaluation benchmarks established from several popular deepfake datasets to compare the generalization of our proposal in several baselines and assess its effectiveness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.