KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning

Song, Dandan; Ma, Siyi; Sun, Zhanchen; Yang, Sicheng; Liao, Lejian

doi:10.1016/j.knosys.2021.107408

Cited by 27 publications

(9 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is not entirely hypothetical. The KVL-BERT system (Song et al, 2021) uses ConceptNet as a resource to answer questions about images. However, KVL-BERT only uses the fact that two concepts are connected in ConceptNet; it entirely ignores the label and direction on the arc between them.…”

Section: An Untrue Claim About Commonsense Knowledgementioning

confidence: 99%

Benchmarks for Automated Commonsense Reasoning: A Survey

Davis¹

2023

Preprint

View full text Add to dashboard Cite

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities.This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.

show abstract

Section: An Untrue Claim About Commonsense Knowledgementioning

confidence: 99%

Benchmarks for Automated Commonsense Reasoning: A Survey

Davis¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Transformer-based endeavors for knowledge-assisted VCR (K-VCR) naturally utilize BERT [27] as the backbone architecture to construct end-to-end KVL models. In KVL-BERT [95], the input Q together with candidate answers A guide the retrieval of relevant commonsense facts [24], resulting in a knowledge-enriched linguistic input. Then, visual features among with this enriched input are inserted in a BERT-like VL model (VL-BERT [96]) so that the correct A is selected.…”

Section: Visual Commonsense Reasoning (Vcr)mentioning

confidence: 99%

The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges

Lymperaiou¹,

Stamou²

2023

Preprint

View full text Add to dashboard Cite

Recent advancements in visiolinguistic (VL) learning have allowed the development of multiple models and techniques that offer several impressive implementations, able to currently resolve a variety of tasks that require the collaboration of vision and language. Current datasets used for VL pre-training only contain a limited amount of visual and linguistic knowledge, thus significantly limiting the generalization capabilities of many VL models. External knowledge sources such as knowledge graphs (KGs) and Large Language Models (LLMs) are able to cover such generalization gaps by filling in missing knowledge, resulting in the emergence of hybrid architectures. In the current survey, we analyze tasks that have benefited from such hybrid approaches. Moreover, we categorize existing knowledge sources and types, proceeding to discussion regarding the KG vs LLM dilemma and its potential impact to future hybrid approaches.

show abstract

“…Researchers have also introduced external knowledge in other tasks such as language generation (Ji et al 2020). Song et al (Song et al 2021) retrieved entity-based knowledge from ConceptNet (Speer, Chin, and Havasi 2017) for visual commonsense reasoning. Garcia et al ) retrieved video-relevant plot summary as external knowledge in a weakly supervised fashion for video question answering.…”

Section: Knowledge-enhanced Reasoningmentioning

confidence: 99%

Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering

Mao

Jiang

Liu

et al. 2023

AAAI

View full text Add to dashboard Cite

Recently, video question answering has attracted growing attention. It involves answering a question based on a fine-grained understanding of video multi-modal information. Most existing methods have successfully explored the deep understanding of visual modality. We argue that a deep understanding of linguistic modality is also essential for answer reasoning, especially for videos that contain character dialogues. To this end, we propose an Inferential Knowledge-Enhanced Integrated Reasoning method. Our method consists of two main components: 1) an Inferential Knowledge Reasoner to generate inferential knowledge for linguistic modality inputs that reveals deeper semantics, including the implicit causes, effects, mental states, etc. 2) an Integrated Reasoning Mechanism to enhance video content understanding and answer reasoning by leveraging the generated inferential knowledge. Experimental results show that our method achieves significant improvement on two mainstream datasets. The ablation study further demonstrates the effectiveness of each component of our approach.

show abstract

KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning

Cited by 27 publications

References 24 publications

Benchmarks for Automated Commonsense Reasoning: A Survey

Benchmarks for Automated Commonsense Reasoning: A Survey

The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges

Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering

Contact Info

Product

Resources

About