2021
DOI: 10.1109/tpami.2020.2973983
|View full text |Cite
|
Sign up to set email alerts
|

Relationship-Embedded Representation Learning for Grounding Referring Expressions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

4
6

Authors

Journals

citations
Cited by 50 publications
(14 citation statements)
references
References 54 publications
0
14
0
Order By: Relevance
“…The proposed residual graph attention network allows us to reason the underlying relationship for a more complex expression as shown in Figure 2(b). With extensive experiments, the proposed approach achieves better performance than other state-of-the-art graph network-based approaches [4,13,3] and demonstrates its effectiveness.…”
Section: Introductionmentioning
confidence: 92%
“…The proposed residual graph attention network allows us to reason the underlying relationship for a more complex expression as shown in Figure 2(b). With extensive experiments, the proposed approach achieves better performance than other state-of-the-art graph network-based approaches [4,13,3] and demonstrates its effectiveness.…”
Section: Introductionmentioning
confidence: 92%
“…MattNet [12] introduces the modular design and improves the grounding accuracy by better modeling the subject, location, and relation-related language description. Recent studies further improve the two-stage methods by better modeling the object relationships [10], [13], [14], [15], [33], enforcing correspondence learning [34], or making use of phrase co-occurrences [35], [36], [37]. One-stage Methods.…”
Section: Visual Groundingmentioning
confidence: 99%
“…Unlike single sentences used in most image-language tasks [36,44,32,37], we adopt free-form language descriptions composed of several sentences in our work, where additional challenges such as long-distance phrase correlations emerge. Specifically, we first build a language scene graph G l from the free-form 3D scene description L to capture the rich structure and relationships between the phrases.…”
Section: Language Scene Graph Modulementioning
confidence: 99%