2022
DOI: 10.32604/cmc.2022.027097
|View full text |Cite
|
Sign up to set email alerts
|

Triple Multimodal Cyclic Fusion and Self-Adaptive Balancing for Video Q&A燬ystems

Abstract: Performance of Video Question and Answer (VQA) systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions' answers. However, traditional linear combinations of multimodal features focus only on shallow feature interactions, fall far short of the need of deep feature fusion. Attention mechanisms were used to perform deep fusion, but most of them can only process weight assignment of single-modal information, leading to attention imbalance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 35 publications
0
0
0
Order By: Relevance