2019
DOI: 10.48550/arxiv.1905.10226
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Reason: A Strong Baseline for Real-World Visual Reasoning

Abstract: This paper presents a strong baseline for real-world visual reasoning (GQA), which achieves 60.93% in GQA 2019 challenge and won the sixth place. GQA is a large dataset with 22M questions involving spatial understanding and multi-step inference. To help further research in this area, we identified three crucial parts that improve the performance, namely: multi-source features, fine-grained encoder, and score-weighted ensemble. We provide a series of analysis on their impact on performance.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 10 publications
0
1
0
Order By: Relevance
“…GQA Models: GQA was introduced in [17] for realworld visual reasoning. Simple monolithic networks [40], MAC netowrk [16], and language-conditioned graph neural networks [15,11] have been developed for this task. LXMERT [37], a large-scale pre-trained encoder, has also been tested on this dataset.…”
Section: Related Workmentioning
confidence: 99%
“…GQA Models: GQA was introduced in [17] for realworld visual reasoning. Simple monolithic networks [40], MAC netowrk [16], and language-conditioned graph neural networks [15,11] have been developed for this task. LXMERT [37], a large-scale pre-trained encoder, has also been tested on this dataset.…”
Section: Related Workmentioning
confidence: 99%