2019
DOI: 10.48550/arxiv.1904.08920
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards VQA Models That Can Read

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…and ask about objects, instead of asking "where" and "why" questions. While answering questions about text in images is currently an open research problem known as TextVQA [38,41,68,77], inspired by this statistic, we augment our descriptions with text extracted from video frames.…”
Section: Toward Automated Question Answeringmentioning
confidence: 99%
“…and ask about objects, instead of asking "where" and "why" questions. While answering questions about text in images is currently an open research problem known as TextVQA [38,41,68,77], inspired by this statistic, we augment our descriptions with text extracted from video frames.…”
Section: Toward Automated Question Answeringmentioning
confidence: 99%
“…Bottom-up-attention features are proposed by [1] who won the first place in the 2017 VQA Challenge. Pythia features are provided by [12], who is the VQA 2018 challenge winner. As we see in Tab 1, Pythia features perform better than bottom-up-attention features, and they have a significant gain than object features for about 3%.…”
Section: Multi-source Image Features 211 Incorporating Better Detecti...mentioning
confidence: 99%
“…Followed by the early works like [1,12], we use the common practice of ensembling several models to obtain better performance. We choose the best ones of all settings above and try different weights when summing the prediction scores.…”
Section: Weighted Ensemblementioning
confidence: 99%
“…Interestingly, concurrently with the ST-VQA challenge, a work similar to ours introduced a new dataset [24] called Text-VQA. This work and the corresponding dataset were published while ST-VQA challenge was on-going.…”
Section: Introductionmentioning
confidence: 99%