2021 IEEE International Conference on Multimedia and Expo (ICME) 2021
DOI: 10.1109/icme51207.2021.9428100
|View full text |Cite
|
Sign up to set email alerts
|

FFNet-M: Feature Fusion Network with Masks for Multimodal Facial Expression Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 18 publications
0
9
0
Order By: Relevance
“…Here, we use the gridfit interpolation [17] and projection for each 3D scan to obtain aligned RGB images and depth maps. Then, we apply the surface processing containing three steps [7], namely outlier removal, hole filling, and noisy removal to improve the data quality.…”
Section: Preprocessingmentioning
confidence: 99%
See 3 more Smart Citations
“…Here, we use the gridfit interpolation [17] and projection for each 3D scan to obtain aligned RGB images and depth maps. Then, we apply the surface processing containing three steps [7], namely outlier removal, hole filling, and noisy removal to improve the data quality.…”
Section: Preprocessingmentioning
confidence: 99%
“…After that, we calculate the average of them to generate the input for ViT. The previous methods try to map a 3D scan into several threechannel pseudo-color images matching the RGB image [20] in order to directly utilize common backbone networks such as VGG16 [21] and ResNet [22] for processing 3D information [4,7]. Therefore, these approaches usually require multi-branch networks with independent parameters to handle different-modal data and fuse them at the feature level.…”
Section: Alternative Fusion Strategymentioning
confidence: 99%
See 2 more Smart Citations
“…Jiao et al [13] proposed the FA-CNN to localize the discriminative facial parts, while the receptive fields will also focus on irrelevant areas such as the forehead, and the distribution is not stable enough from their visualization of heat maps. Sui et al [17] designed the masks to directly enhance the local features in the whole salient regions, however, diverse components make various contributions to the judgment of one expression. For example, the features of the eyes and mouth are more critical than those of the nose.…”
Section: Introductionmentioning
confidence: 99%