2019
DOI: 10.1007/978-3-030-20870-7_28
|View full text |Cite
|
Sign up to set email alerts
|

PIRC Net: Using Proposal Indexing, Relationships and Context for Phrase Grounding

Abstract: Phrase Grounding aims to detect and localize objects in images that are referred to and are queried by natural language phrases. Phrase grounding finds applications in tasks such as Visual Dialog, Visual Search and Image-text co-reference resolution. In this paper, we present a framework that leverages information such as phrase category, relationships among neighboring phrases in a sentence and context to improve the performance of phrase grounding systems. We propose three modules: Proposal Indexing Network(… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…For example, the model can not recover when the region proposal network fails to return a bounding box for the object of interest. In this case, relative regression or query-based approaches can be used [7,22]. Backbones and word attention Surprisingly, some of the higher ranked methods use the same visual backbone as their lower ranked competitors.…”
Section: Qualitative Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, the model can not recover when the region proposal network fails to return a bounding box for the object of interest. In this case, relative regression or query-based approaches can be used [7,22]. Backbones and word attention Surprisingly, some of the higher ranked methods use the same visual backbone as their lower ranked competitors.…”
Section: Qualitative Comparisonmentioning
confidence: 99%
“…For example, the model can not recover when the region proposal network fails to return a bounding box for the object of interest. In this case, relative regression or query-based approaches can be used [7,22].…”
Section: Qualitative Comparisonmentioning
confidence: 99%
“…Hence, the ground truth has better precision in this dataset. Our model achieves 12.30% better R@1 performance than the current state-of-the-art one-stage [22] and two-stage [26] approaches. This performance boost can be attributed to the following key points: (1).…”
Section: Quantitative Analysismentioning
confidence: 84%
“…In this section, we present experiments to evaluate our proposed model MAGNet on varieties of datasets with multiple evaluation metrics and compare our results to the state-of-the-art visual grounding methods [26] and [27]. Results of ablation studies with different configurations will also be reported to further explain the design decisions of the proposed model.…”
Section: Methodsmentioning
confidence: 99%
“…Flickr30K accuracy (%) Similarity Net [54] ResNet-101 60.89 CITE [24] ResNet-101 61.33 PIRC [55] ResNet-101 72.83 DDPN [56] ResNet-101 73.30 LCMCG [57] ResNet-101 76.74 ZSGNet [58] ResNet-50 63.39 FAOA [32] DarkNet-53 68.71 ReSC-Large [50] DarkNet-53 69.28 TransVG [53] ResNet-50 78.47 TransVG [53] ResNet-101 79.…”
Section: Models Backbonementioning
confidence: 99%