2016
DOI: 10.48550/arxiv.1608.00272
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Modeling Context in Referring Expressions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…In this paper, we solve a much harder task -we train our model to directly predict the bounding box, given a referring expression and the associated image. There are three established datasets for this task called RefCOCO, RefCOCO+ [71] and Ref-COCOg [36]. Since during pre-training we annotate every object referred to within the text, there is a slight shift in the way the model is used in this task.…”
Section: Downstream Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper, we solve a much harder task -we train our model to directly predict the bounding box, given a referring expression and the associated image. There are three established datasets for this task called RefCOCO, RefCOCO+ [71] and Ref-COCOg [36]. Since during pre-training we annotate every object referred to within the text, there is a slight shift in the way the model is used in this task.…”
Section: Downstream Tasksmentioning
confidence: 99%
“…We hypothesize that it enables the attention pattern for each question type to specialize accordingly to the task, there-by yielding better performance. C. Dataset constructions MS COCO On the COCO dataset, we include annotations from the referring expressions datasets (RefCOCO [71], Ref-COCO+ [71] and RefCOCOg [36] datasets). By construction, in this dataset, each referring expression is a whole sentence that describes one object in the image, where the constituent noun phrases from the sentences are not themselves annotated.…”
Section: B32 Question Answering Ablationsmentioning
confidence: 99%
“…For further insight on this case we examine the RMAE-coverage curve (Figure 3), where we see that while combined replacement reaches a lower RMAE, it does so slowly. This is likely due to the difference in data present in the two splits: TestA focuses primarily on people, while TestB focuses on objects (Yu et al 2016). The more varied set of classes leads to a more spread output distribution, meaning a target object may be re-queried several times before the combined distribution achieves a high certainty.…”
Section: Smart Replacementmentioning
confidence: 99%