2021
DOI: 10.1016/j.knosys.2021.107408
|View full text |Cite
|
Sign up to set email alerts
|

KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 24 publications
0
7
0
Order By: Relevance
“…This is not entirely hypothetical. The KVL-BERT system (Song et al, 2021) uses ConceptNet as a resource to answer questions about images. However, KVL-BERT only uses the fact that two concepts are connected in ConceptNet; it entirely ignores the label and direction on the arc between them.…”
Section: An Untrue Claim About Commonsense Knowledgementioning
confidence: 99%
“…This is not entirely hypothetical. The KVL-BERT system (Song et al, 2021) uses ConceptNet as a resource to answer questions about images. However, KVL-BERT only uses the fact that two concepts are connected in ConceptNet; it entirely ignores the label and direction on the arc between them.…”
Section: An Untrue Claim About Commonsense Knowledgementioning
confidence: 99%
“…Transformer-based endeavors for knowledge-assisted VCR (K-VCR) naturally utilize BERT [27] as the backbone architecture to construct end-to-end KVL models. In KVL-BERT [95], the input Q together with candidate answers A guide the retrieval of relevant commonsense facts [24], resulting in a knowledge-enriched linguistic input. Then, visual features among with this enriched input are inserted in a BERT-like VL model (VL-BERT [96]) so that the correct A is selected.…”
Section: Visual Commonsense Reasoning (Vcr)mentioning
confidence: 99%
“…Researchers have also introduced external knowledge in other tasks such as language generation (Ji et al 2020). Song et al (Song et al 2021) retrieved entity-based knowledge from ConceptNet (Speer, Chin, and Havasi 2017) for visual commonsense reasoning. Garcia et al ) retrieved video-relevant plot summary as external knowledge in a weakly supervised fashion for video question answering.…”
Section: Knowledge-enhanced Reasoningmentioning
confidence: 99%