Automated deception detection systems can enhance health, justice, and security in society by helping humans detect deceivers in highstakes situations across medical and legal domains, among others. Existing machine learning approaches for deception detection have not leveraged dimensional representations of facial aect: valence and arousal. This paper presents a novel analysis of the discriminative power of facial aect for automated deception detection, along with interpretable features from visual, vocal, and verbal modalities. We used a video dataset of people communicating truthfully or deceptively in real-world, high-stakes courtroom situations. We leveraged recent advances in automated emotion recognition in-the-wild by implementing a state-of-the-art deep neural network trained on the A-Wild database to extract continuous representations of facial valence and facial arousal from speakers. We experimented with unimodal Support Vector Machines (SVM) and SVM-based multimodal fusion methods to identify eective features, modalities, and modeling approaches for detecting deception. Unimodal models trained on facial aect achieved an AUC of 80%, and facial aect contributed towards the highest-performing multimodal approach (adaptive boosting) that achieved an AUC of 91% when tested on speakers who were not part of training sets. This approach achieved a higher AUC than existing automated machine learning approaches that used interpretable visual, vocal, and verbal features to detect deception in this dataset, but did not use facial aect. Across all videos, deceptive and truthful speakers exhibited signicant dierences in facial valence and facial arousal, contributing computational support to existing psychological theories on relationships between aect and deception. The demonstrated importance of facial aect in our models informs and motivates the future development of automated, aect-aware machine learning approaches for modeling and detecting deception and other social behaviors in-the-wild.
Automated systems that detect deception in high-stakes situations can enhance societal well-being across medical, social work, and legal domains. Existing models for detecting highstakes deception in videos have been supervised, but labeled datasets to train models can rarely be collected for most realworld applications. To address this problem, we propose the first multimodal unsupervised transfer learning approach that detects real-world, high-stakes deception in videos without using high-stakes labels. Our subspace-alignment (SA) approach adapts audio-visual representations of deception in lab-controlled low-stakes scenarios to detect deception in real-world, high-stakes situations. Our best unsupervised SA models outperform models without SA, outperform human ability, and perform comparably to a number of existing supervised models. Our research demonstrates the potential for introducing subspace-based transfer learning to model highstakes deception and other social behaviors in real-world contexts with a scarcity of labeled behavioral data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.