2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.02025
|View full text |Cite
|
Sign up to set email alerts
|

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(14 citation statements)
references
References 39 publications
0
14
0
Order By: Relevance
“…We use two in-the-wild DFER datasets (i.e., DFEW [14] and FERV39K [15]) to evaluate our proposed method. For both DFEW and FERV39K, the processed face region images are officially detected, aligned and publicly available.…”
Section: Methodsmentioning
confidence: 99%
“…We use two in-the-wild DFER datasets (i.e., DFEW [14] and FERV39K [15]) to evaluate our proposed method. For both DFEW and FERV39K, the processed face region images are officially detected, aligned and publicly available.…”
Section: Methodsmentioning
confidence: 99%
“…MAE is a self-supervised model that trains label-free data by masking random patches from the input image and reconstructing the missing patches in the pixel space. We used a face dataset of scale 1.2 million, including DFEW [24], Emotionet [25], FERV39k [26] and so on, to pre-train the MAE encoder. We denotes this kind of feature as mae, and the dimension of it is 768.…”
Section: Visual Featuresmentioning
confidence: 99%
“…Researchers have proposed various techniques to effectively improve the performance of DFER methods in the laboratory scenarios (Yu et al 2018;Jeong, Kim, and Dong 2020). Compared with lab-controlled DFER datasets, the in-thewild ones are closer to the natural facial events and can provide more diverse data by collecting video sequences from the internet, such as Aff-Wild (Zafeiriou et al 2017), DFEW (Jiang et al 2020), and FERV39k (Wang et al 2022). As shown in Figure 1, the video sequences in the real world with different expression intensities could result in the problem that the inter-class distance becomes smaller than the intra-class distance.…”
Section: Introductionmentioning
confidence: 99%
“…For DFER in the wild, the early works are mainly designed based on the hand-crafted features, like LBP-TOP (Dhall et al 2013), STLMBP (Huang et al 2014), and HOG-TOP (Chen et al 2014). In recent years, with the development of parallel computing hardware and collection of largescale DFER datasets (Wang et al 2022;Jiang et al 2020), deep learning-based methods have gradually replaced the algorithms based on hand-crafted features and achieved stateof-the-art performance on the in-the-wild DFER datasets. For instance, vision transformer (ViT) (Dosovitskiy et al 2020) has obtained promising results on many computer vision tasks, which inspires many researchers to build DFER models based on ViT Ma, Sun, and Li 2022).…”
Section: Introductionmentioning
confidence: 99%