Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3479209
|View full text |Cite
|
Sign up to set email alerts
|

Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(7 citation statements)
references
References 9 publications
0
7
0
Order By: Relevance
“…Facial expressions can be divided into individual muscle movement components known as Action Units (AUs) [5]. As shown by the experiment described in [32], a single macro-or micro-expression may have more than one AU with high intensity.…”
Section: Swin Transformermentioning
confidence: 99%
See 1 more Smart Citation
“…Facial expressions can be divided into individual muscle movement components known as Action Units (AUs) [5]. As shown by the experiment described in [32], a single macro-or micro-expression may have more than one AU with high intensity.…”
Section: Swin Transformermentioning
confidence: 99%
“…The existing macro-and micro-expression spotting approaches can be roughly divided into traditional approaches and deep learning approaches [5]. Traditional expression spotting approaches use manually crafted features to determine whether or not a frame is an expression frame.…”
Section: Introductionmentioning
confidence: 99%
“…The framebased spotting task including onset, apex and offset location of one ME has been explored by Patel based on a discriminative response map fitting (DRMF) model [14].Guo considered the motion angle information and designed magnitude and angle combined (MAC) optical flow features to improve spotting efficiency [15]. In recent years, deep learning models have been widely used to detect the interval or apex frame of ME by using extracted features of raw images in video sequences as input [16]. Some interval-based deep learning models use video clips as input and utilize long short-term memory (LSTM) [17][18][19] or a clip proposal network [20] to obtain potential ME intervals.…”
Section: Introductionmentioning
confidence: 99%
“…Variant CNNbased models are also employed. For example, a Concat-CNN model consisting of three streams of convolutional networks with different sizes of convolution kernels [30] was proposed to learn feature correlations among facial action units (AUs) of different frames. In addition, a local bilinear convolutional neural network (LBCNN) [31] was proposed to transform the micro-expression spotting task into a fine-grained image recognition problem.…”
Section: Introductionmentioning
confidence: 99%