Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2018
DOI: 10.1145/3219819.3220036
|View full text |Cite
|
Sign up to set email alerts
|

R-Vqa

Abstract: Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. H… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 61 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…Visual Genome datasets are used for various tasks, such as scene graphics, image extraction, and basic neural understanding of scenes. There is a number of datasets that are based on Visual Genome, like Relation-VQA [18] where each triplet (image, question, answer) is supported with relation fact. Fact-based Visual Question Answering (FVQA) [10] and multihop and multimodal QA (WebQA) [2] datasets use external knowledge outside of text and images.…”
Section: Related Datasetsmentioning
confidence: 99%
“…Visual Genome datasets are used for various tasks, such as scene graphics, image extraction, and basic neural understanding of scenes. There is a number of datasets that are based on Visual Genome, like Relation-VQA [18] where each triplet (image, question, answer) is supported with relation fact. Fact-based Visual Question Answering (FVQA) [10] and multihop and multimodal QA (WebQA) [2] datasets use external knowledge outside of text and images.…”
Section: Related Datasetsmentioning
confidence: 99%
“…With this model, the researchers demonstrated the impact of all potential combinations of recurrent and convolutional dual attention. Lu et al (2018b) proposed a novel sequential attention mechanism to seamlessly combine visual and semantic clues for VQA.…”
Section: Application Of Attention Mechanisms In Vqamentioning
confidence: 99%