2021
DOI: 10.1016/j.imavis.2021.104327
|View full text |Cite
|
Sign up to set email alerts
|

A survey of methods, datasets and evaluation metrics for visual question answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(15 citation statements)
references
References 36 publications
0
15
0
Order By: Relevance
“…While impressive in their ability to solve traditional computer vision tasks such as detection and recognition, these models still exhibit limitations toward reasoning and inference over how people think and talk about the world. For example, VQA models are biased by how questions are asked (Sharma & Jalal, 2021) and the reasoning behind their output is often opaque (Khan et al, 2022). Thus, it is difficult to interpret the errors they make or whether their reasoning incorporates any structural elements of either individual or shared agency.…”
Section: Discussionmentioning
confidence: 99%
“…While impressive in their ability to solve traditional computer vision tasks such as detection and recognition, these models still exhibit limitations toward reasoning and inference over how people think and talk about the world. For example, VQA models are biased by how questions are asked (Sharma & Jalal, 2021) and the reasoning behind their output is often opaque (Khan et al, 2022). Thus, it is difficult to interpret the errors they make or whether their reasoning incorporates any structural elements of either individual or shared agency.…”
Section: Discussionmentioning
confidence: 99%
“…7 Donahue et al 9 presented a recurrent convolutional architecture that offered simultaneous learning of temporal dynamics and convolutional perceptual representations. In this sequence, Yang and Xu 12 proposed a visual question answering (VQA) [17][18][19] -based caption generation model to understand the image content in a deeper way using the knowledge learned from the VQA algorithm by asking questions about a given image.…”
Section: Cnn-based Methodsmentioning
confidence: 99%
“…Other popular VQA datasets are Flickr30k-Entities [26], COCO-QA [27], Visual7W [28] and others. For a detailed analysis on VQA and relevant topics we refer readers to recent specialized survey papers [1,2,3,29,30,31].…”
Section: Related Workmentioning
confidence: 99%