2019
DOI: 10.1007/978-3-030-11009-3_32
|View full text |Cite
|
Sign up to set email alerts
|

Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions

Abstract: In-depth scene descriptions and question answering tasks have greatly increased the scope of today's definition of scene understanding. While such tasks are in principle open ended, current formulations primarily focus on describing only the current state of the scenes under consideration. In contrast, in this paper, we focus on the future states of the scenes which are also conditioned on actions. We posit this as a question answering task, where an answer has to be given about a future scene state, given obs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
17
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(18 citation statements)
references
References 26 publications
1
17
0
Order By: Relevance
“…(a benchmark dataset for "physical intelligence") (Wagner et al, 2018) has some similarity with ours. It has synthetically rendered table-top scenes, four types of actions (push, rotate, remove and drop) being performed on an object and what-if questions.…”
Section: Related Workmentioning
confidence: 81%
“…(a benchmark dataset for "physical intelligence") (Wagner et al, 2018) has some similarity with ours. It has synthetically rendered table-top scenes, four types of actions (push, rotate, remove and drop) being performed on an object and what-if questions.…”
Section: Related Workmentioning
confidence: 81%
“…The research design was descriptive qualitative related to developing physics testlet templates. The templates focused on developing knowledge structures [12] through-provoking deeper context-understanding of what-if questions [13]. The testlet accommodated the context expansion of the problems to reveal more comprehensive understanding.…”
Section: Methodsmentioning
confidence: 99%
“…Since labeled training examples are not readily available in many domains, researchers have explored approaches that simulate labeled data or use prior knowledge to constrain learning. For instance, physics engines have been used to generate labeled data for training deep networks that predict the movement of objects in response to external forces [17,43,60], or for understanding the physics of scenes [5]. A recurrent neural network (RNN) architecture augmented by arithmetic and logical operations has been used to answer questions about scenes [44], but it used textual information instead of the more informative visual data and did not support reasoning with commonsense knowledge.…”
Section: Related Workmentioning
confidence: 99%