2021
DOI: 10.1109/access.2021.3107548
|View full text |Cite
|
Sign up to set email alerts
|

Deep Graph Fusion Based Multimodal Evoked Expressions From Large-Scale Videos

Abstract: In the case of emotion recognition for wild input signals with large variances, multiple sources of noise can challenge the machine's ability to learn approximate ground truth. There are copious studies on recognizing characters' affective expressions directly through face, speech, and text. However, there are few researches on the prediction of a characters' emotions from their watched contents. Therefore, in this paper, we propose a hybrid fusion model, called deep graph fusion, to leverage the combination o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 48 publications
0
2
0
Order By: Relevance
“…Ho et al [36] Introduced a novel hybrid fusion model, termed "deep graph fusion," designed to predict viewers' elicited expressions from videos through the amalgamation of visual and audio representations. The proposed system unfolds in four stages: firstly, features are extracted for visual and auditory modalities using CNNbased pre-trained models.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Ho et al [36] Introduced a novel hybrid fusion model, termed "deep graph fusion," designed to predict viewers' elicited expressions from videos through the amalgamation of visual and audio representations. The proposed system unfolds in four stages: firstly, features are extracted for visual and auditory modalities using CNNbased pre-trained models.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The deep graph fusion model for video-based emotion prediction exhibits superior performance on the EEV database. Future refinements are suggested to enhance the training approach, considering inter-segment connections, and addressing memory device constraints [36].…”
Section: N Deep Graph Fusion Based Multimodal Evoked Expressions Frommentioning
confidence: 99%