2018
DOI: 10.1609/aaai.v32i1.12021
|View full text |Cite
|
Sign up to set email alerts
|

Memory Fusion Network for Multi-view Sequential Learning

Abstract: Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first compone… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
121
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 415 publications
(122 citation statements)
references
References 35 publications
1
121
0
Order By: Relevance
“…Consistent with the previous works [ 12 , 52 ], we adopt the metrics of 7-class accuracy (from strongly negative to strongly positive), binary accuracy (positive/negative sentiments) and F1 score (harmonic mean of the binary precision and recall). Specifically, the predicted digit will be rounded first.…”
Section: Resultsmentioning
confidence: 99%
“…Consistent with the previous works [ 12 , 52 ], we adopt the metrics of 7-class accuracy (from strongly negative to strongly positive), binary accuracy (positive/negative sentiments) and F1 score (harmonic mean of the binary precision and recall). Specifically, the predicted digit will be rounded first.…”
Section: Resultsmentioning
confidence: 99%
“…The heat map of the results on the multi-modal fine-grained emotion analysis structure based on feature layer fusion is shown in Figure 9 and Figure 10 . As shown in Table 3 , the benchmark models of this experiment are HiGRU [ 29 ], HiGRU-sf [ 29 ], mement [ 30 ], cLSTM [ 17 ], TFN [ 31 ], MFN [ 32 ], CMU [ 33 ], and ICON [ 10 ], all of which are the optimal results for the multi-modal fine-grained analysis of the IEMOCAP data set. Other models, including SVC, LR, etc., are all models used after over-sampling the samples.…”
Section: Parameter Setting and Results Analysismentioning
confidence: 99%
“…The most straightforward way is to directly concatenating feature maps from each modality (Ngiam et al 2011). To leverage complementary information across different modalities, tensor fusion (Zadeh et al 2017;Liu et al 2018), memory fusion (Zadeh et al 2018a), factorization fusion (Valada, Mohan, and Burgard 2020) explicitly account for intra-modal and inter-modal dynamics. The above mentioned methods are aggregation-based fusion and the modality gap heavily affects cross-modal fusion.…”
Section: Related Workmentioning
confidence: 99%