Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improve the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
The prevalence of location-based services has generated a deluge of check-ins, enabling the task of human mobility understanding. Among the various types of information associated with the check-in venues, categories (e.g.,
Bar
and
Museum
) are vital to the task, as they often serve as excellent semantic characterization of the venues. Despite its significance and importance, a large portion of venues in the check-in services do not have even a single category label, such as up to 30% of venues in the Foursquare system lacking category labels. We therefore address the problem of semantic venue annotation, i.e., labeling the venue with a semantic category. Existing methods either fail to fully exploit the contextual information in the check-in sequences, or do not consider the semantic correlations across related categories. As such, we devise a Tree-guided Multi-task Embedding model (TME for short) to learn effective representations of venues and categories for the semantic annotation. TME jointly learns a common feature space by modeling multi-contexts of check-ins and utilizes the predefined category hierarchy to regularize the relatedness among categories. We evaluate TME over the task of semantic venue annotation on two check-in datasets. Experimental results show the superiority of TME over several state-of-the-art baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.