2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.312
|View full text |Cite
|
Sign up to set email alerts
|

MarioQA: Answering Questions by Watching Gameplay Videos

Abstract: We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed automatically from gameplay videos. Our work is motivated by the fact that existing models are often tested only on datasets that require excessively high-level reasoning or mostly contain instances accessible through single frame inferences. Hence, it is difficult to measure capacity and flexibility of trained models, and existing techniques often rely… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
51
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 82 publications
(51 citation statements)
references
References 22 publications
0
51
0
Order By: Relevance
“…Comparing with images, temporal domain is unique to videos. A temporal attention mechanism is leveraged to selectively attend to one or more periods of a video in [16,24,35]. Besides temporal attention mecha- Figure 2.…”
Section: Related Workmentioning
confidence: 99%
“…Comparing with images, temporal domain is unique to videos. A temporal attention mechanism is leveraged to selectively attend to one or more periods of a video in [16,24,35]. Besides temporal attention mecha- Figure 2.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, modular networks [4,20,25] that construct an explicit representation of the reasoning process by exploiting the compositional nature of language have been proposed. Similar architectures have also been applied to the video domain with extensions such as spatiotemporal attention [23,49]. Our proposed approach to question answering allows the agent to interact with its environment and is thus fundamentally different to past QA approaches.…”
Section: Related Workmentioning
confidence: 99%
“…In the video domain, the TGIF-QA (Jang et al, 2017) and Mario-QA (Mun et al, 2016) datasets provide opportunities to study temporal reasoning for the task of VQA. The TGIF-QA dataset considers three types of temporal questions: before/after questions, repetition count, and determining a repeating action.…”
Section: Related Workmentioning
confidence: 99%