Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.44
|View full text |Cite
|
Sign up to set email alerts
|

ConceptBert: Concept-Aware Representation for Visual Question Answering

Abstract: Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Current works in VQA focus on questions which are answerable by direct analysis of the question and image alone. We present a concept-aware algorithm, ConceptBert, for questions which require common sense, or basic factual knowledge from external structured content. Given an image and a question in natural language, ConceptBert requires visu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
49
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 91 publications
(54 citation statements)
references
References 20 publications
1
49
0
Order By: Relevance
“…Mucko (Zhu et al, 2020) goes a step further, reasoning on visual, fact, and semantic graphs separately, and uses cross-modal networks to aggregate them together. ConceptBert (Gardères et al, 2020) combines the BERT-pretrained model (Devlin et al, 2019) with KG. It encodes the KG using a transformer with a BERT embedding query.…”
Section: Related Workmentioning
confidence: 99%
“…Mucko (Zhu et al, 2020) goes a step further, reasoning on visual, fact, and semantic graphs separately, and uses cross-modal networks to aggregate them together. ConceptBert (Gardères et al, 2020) combines the BERT-pretrained model (Devlin et al, 2019) with KG. It encodes the KG using a transformer with a BERT embedding query.…”
Section: Related Workmentioning
confidence: 99%
“…Recent approaches have shown great potential to incorporate external knowledge for knowledgebased VQA. Several methods explore aggregating the external knowledge either in the form of structured knowledge graphs (Garderes et al, 2020;Narasimhan et al, 2018;Li et al, 2020b;Wang et al, 2017a,b) or unstructured knowledge bases (Marino et al, 2021;Wu et al, 2021;Luo et al, 2021). In these methods, object detectors (Ren et al, 2015) and scene classifiers (He et al, 2016) are used to associate images with external knowledge.…”
Section: Knowledge-basedmentioning
confidence: 99%
“…Further, external APIs, such as Google (Wu et al, 2021;Luo et al, 2021), Microsoft (Yang et al, 2021), and OCR (Luo et al, 2021;Wu et al, 2021) are used to enrich the associated knowledge. Finally, pre-trained transformerbased language models (Yang et al, 2021) or multimodal models (Wu et al, 2021;Luo et al, 2021;Wu et al, 2021;Garderes et al, 2020;Marino et al, 2021) are leveraged as implicit knowledge bases for answer predictions.…”
Section: Knowledge-basedmentioning
confidence: 99%
See 1 more Smart Citation
“…Incorporating external knowledge into VQA models combines visual observations with external knowledge (Garderes et al, 2020). Organizing the external knowledge and storing them in a structured database, such as a Knowledge Bases (KB), have become important resources for representing the general knowledge.…”
Section: Introductionmentioning
confidence: 99%