Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics 2023
DOI: 10.18653/v1/2023.eacl-main.15
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Graph Transformer for Multimodal Question Answering

Xuehai He,
Xin Wang

Abstract: Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, structured learning approaches such as graph neural networks (GNNs) that integrate prior information can barely compete with Transformer models. In this work, we aim to benefit from both worlds and propose a novel Multimodal Graph Transformer for question answering tasks that requires performing reasoning across mu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 39 publications
0
4
0
Order By: Relevance
“…Early studies (Hannan et al, 2020;Talmor et al, 2021) decompose MMQA into three single-modal QA models. To align different modalities, some latest studies employ graph structures (Yang et al, 2022a;He and Wang, 2023) to enhance the cross-modal interaction.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Early studies (Hannan et al, 2020;Talmor et al, 2021) decompose MMQA into three single-modal QA models. To align different modalities, some latest studies employ graph structures (Yang et al, 2022a;He and Wang, 2023) to enhance the cross-modal interaction.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, we employ prefix-tuning as our fine-tuning strategy, which adds a task-specific prefix (e.g., MMQA) to the input sequence as a prompt. (Yang et al, 2022a), MGT (He and Wang, 2023), ManymodalQA (Hannan et al, 2020), ORConvQA (Qu et al, 2020), MAE , Solar . Detailed descriptions of each baseline are provided in Appendix A.…”
Section: Sequence-to-sequence Training Proceduresmentioning
confidence: 99%
See 2 more Smart Citations