2021
DOI: 10.48550/arxiv.2101.05954
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Recent Advances in Video Question Answering: A Review of Datasets and Methods

Devshree Patel,
Ratnam Parikh,
Yesha Shastri

Abstract: Video Question Answering (VQA) is a recent emerging challenging task in the field of Computer Vision. Several visual information retrieval techniques like Video Captioning/Description and Video-guided Machine Translation have preceded the task of VQA. VQA helps to retrieve temporal and spatial information from the video scenes and interpret it. In this survey, we review a number of methods and datasets for the task of VQA. To the best of our knowledge, no previous survey has been conducted for the VQA task.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 46 publications
0
2
0
Order By: Relevance
“…Neural attention mechanisms have become the de-facto standard in machine comprehension tasks (Sood et al, 2020;Yu et al, 2019;Li et al, 2019). In VideoQA, attention mechanisms are particularly important given that the information necessary to generate correct answers is scattered across frames -many of which are redundant or even irrelevant to the question at hand (Patel et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Neural attention mechanisms have become the de-facto standard in machine comprehension tasks (Sood et al, 2020;Yu et al, 2019;Li et al, 2019). In VideoQA, attention mechanisms are particularly important given that the information necessary to generate correct answers is scattered across frames -many of which are redundant or even irrelevant to the question at hand (Patel et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…1 Our code is publicly available at the project website https://www.perceptualui.org/publications/ abdessaied22_repl4NLP/ tried to close by leveraging external memory (Kim et al, 2018(Kim et al, , 2019Fan et al, 2019). While external memory allows models to cache sequential information and retrieve relevant multimodal content (Patel et al, 2021), latest models still suffer from decreased performance, for example on ambiguous questions that require deeper reasoning abilities.…”
Section: Introductionmentioning
confidence: 99%