In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers. These expressions provide visual cues to an individual’s emotions, which can be used in a broad range of potential applications such as lie detection and policing. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Unlike other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. In contrast to other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which incorporates Shifted Patch Tokenization and Locality Self-Attention into the backbone Swin Transformer network, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach performs well on the MEGC 2021 spotting task, with an overall F1-score of 0.1824 and 0.1357 on the CAS(ME)2 and SAMM Long Videos, respectively. The code is publicly available on GitHub.
In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers since they provide visual cues of an individual's emotions for a broad range of potential applications such as lie detection and criminal detection. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Different from other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. In contrast to other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which incorporates Shifted Patch Tokenization and Locality Self-Attention into the backbone network Swin Transformer, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach also performs well in the MEGC 2021 spotting task with an overall F1-score of 0.1824 and 0.1357 on CAS(ME)^2 and SAMM Long Videos, respectively. The code is publicly available on GitHub (https://github.com/eddiehe99/pytorch-expression-spotting).
In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers since they provide visual cues of an individual’s emotions for a broad range of potential applications such as lie detection and criminal detection. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Different from other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. Unlike other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which applies Shifted Patch Tokenization and Locality Self-Attention to the backbone network Swin Transformer, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach also performs well in the MEGC 2021 spotting task with an overall F1-score of 0.1824 and 0.1357 on CAS(ME)2 and SAMM Long Videos, respectively. The code is publicly available on GitHub (https://github.com/eddiehe99/pytorch-expression-spotting).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.