Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.289
|View full text |Cite
|
Sign up to set email alerts
|

CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

Abstract: Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video. In this paper, we take visual understanding to a higher level where systems are challenged to answer questions that involve mentally simulating the hypothetical consequences of performing specific actions in a given scenario. Towards that end, we formulate a vision-language question answering task based on the CLEVR (Johnson et al., 2017a) dataset. Wethen modify the best existing VQA m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…Bisk et al (2020) designed the PIQA benchmark to evaluate physical commonsense reasoning in LLMs through question answering. Sampat et al (2021) proposed an extension to the CLEVR dataset, where an agent must reason and answer questions about a scene after a hypothetical action is taken.…”
Section: Related Workmentioning
confidence: 99%
“…Bisk et al (2020) designed the PIQA benchmark to evaluate physical commonsense reasoning in LLMs through question answering. Sampat et al (2021) proposed an extension to the CLEVR dataset, where an agent must reason and answer questions about a scene after a hypothetical action is taken.…”
Section: Related Workmentioning
confidence: 99%
“…Theorem proving (Alvandi and Watt 2019;Zhelezniakov, Zaytsev, and Radyvonenko 2021) and handwritten formula recognition (Dai et al 2019;Sinha et al 2019) are typical cognitive tasks that require perceiving mathematical symbols from pictures and calculate results based on mathematical theorems. CLEVR (Johnson et al 2017;Sampat et al 2021) also provides a benchmark that introduces logical reasoning to visual question answering. Sinha et al (2019) define CLUTRR benchmarks with robustness and generalization evaluations to measure models' perception and reasoning abilities, respectively.…”
Section: Related Work Cognitive Taskmentioning
confidence: 99%