2023
DOI: 10.1371/journal.pone.0287557
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal adaptive gated mechanism for visual question answering

Abstract: Visual Question Answering (VQA) is a multimodal task that uses natural language to ask and answer questions based on image content. For multimodal tasks, obtaining accurate modality feature information is crucial. The existing researches on the visual question answering model mainly start from the perspective of attention mechanism and multimodal fusion, which will tend to ignore the impact of modal interaction learning and the introduction of noise information in the process of modal fusion on the overall per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 67 publications
0
1
0
Order By: Relevance
“…MAGM [ 58 ] introduces an adaptive gating mechanism in the process of intra-modal, inter-modal learning, and modal fusion. This model can effectively filter out irrelevant noise information, obtain fine-grained modal features, and improve the model’s adaptive control over the contribution of two modal features to answer prediction.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…MAGM [ 58 ] introduces an adaptive gating mechanism in the process of intra-modal, inter-modal learning, and modal fusion. This model can effectively filter out irrelevant noise information, obtain fine-grained modal features, and improve the model’s adaptive control over the contribution of two modal features to answer prediction.…”
Section: Experiments and Resultsmentioning
confidence: 99%