Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.196
|View full text |Cite
|
Sign up to set email alerts
|

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…Cao et al [162] provided the blueprint for the decision tree and proposed a parse-tree-guided reasoning network for interpretable VQA. Wang et al [163] fashioned a model that learns multimodal interaction representations from trilinear transformers (MIRTT) for VQA tasks. In the domain of video QA, Peng et al [164] unveiled a multilevel hierarchical network (MHN) that takes into account the information spanning various temporal scales.…”
Section: Cognitionmentioning
confidence: 99%
“…Cao et al [162] provided the blueprint for the decision tree and proposed a parse-tree-guided reasoning network for interpretable VQA. Wang et al [163] fashioned a model that learns multimodal interaction representations from trilinear transformers (MIRTT) for VQA tasks. In the domain of video QA, Peng et al [164] unveiled a multilevel hierarchical network (MHN) that takes into account the information spanning various temporal scales.…”
Section: Cognitionmentioning
confidence: 99%
“…These uncertainties show considerable challenges in the effective training of AI models for these specialized applications. Contrary to addressing these issues, existing methods [5]- [7] often overlook these uncertainties, which often results in limited capabilities in comprehending complex concept hierarchies and a lack of prediction diversity. Therefore, it is imperative to model such multimodal uncertainties.…”
Section: Introductionmentioning
confidence: 99%