2023
DOI: 10.48550/arxiv.2302.11713
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Abstract: Large language models [5,7] have demonstrated an emergent capability in answering knowledge intensive questions. With recent progress on web-scale visual and language pre-training [2,6,38], do these models also understand how to answer visual information seeking questions? To answer this question, we present INFOSEEK 1 , aVisual Question Answering dataset that focuses on asking information-seeking questions, where the information can not be answered by common sense knowledge. We perform a multi-stage human ann… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 45 publications
0
1
0
Order By: Relevance
“…A few works focus on direct question answering on charts such as DVQA [11], FigureQA [9] and PlotQA [10], and made their dataset public. Meanwhile Chen et al [22] introduced a benchmark data for visual information-seeking questions in natural images. The work by Samira et al [9] works with five distinct types of charts: line, dot-line, vertical and horizontal bar charts, and pie plots.…”
Section: B Question Answering On Chartsmentioning
confidence: 99%
“…A few works focus on direct question answering on charts such as DVQA [11], FigureQA [9] and PlotQA [10], and made their dataset public. Meanwhile Chen et al [22] introduced a benchmark data for visual information-seeking questions in natural images. The work by Samira et al [9] works with five distinct types of charts: line, dot-line, vertical and horizontal bar charts, and pie plots.…”
Section: B Question Answering On Chartsmentioning
confidence: 99%