2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00448
|View full text |Cite
|
Sign up to set email alerts
|

Bi-Directional Relationship Inferring Network for Referring Image Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
101
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 137 publications
(101 citation statements)
references
References 28 publications
0
101
0
Order By: Relevance
“…Finally, the cross-modal features are used to generate the final prediction masks. Unlike existing RES methods [13,14], which segment objects according to the query text, we input text in parallel with the input image to extract information. By combining crossmodal features from both image and text, we accurately segment fluorescein leakage.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, the cross-modal features are used to generate the final prediction masks. Unlike existing RES methods [13,14], which segment objects according to the query text, we input text in parallel with the input image to extract information. By combining crossmodal features from both image and text, we accurately segment fluorescein leakage.…”
Section: Methodsmentioning
confidence: 99%
“…1c). The recent success of reference expression segmentation (RES), which involves the use of natural language expressions to locate objects [13,14], suggests the possibility of using cross-modal data to build a robust and effective framework for fluorescein leakage segmentation.…”
Section: Introductionmentioning
confidence: 99%
“…Previous approaches [3,15,9,10] have shown that integrating multi-modal features from different levels of CNN can further improve the accuracy of segmentation masks. In our work, we introduce a bi-directionally convolutional GRU (Bi-ConvGRU) to progressively integrate the fused multimodal features in bottom-up and top-down manners, which is corresponding to the two directions of forward and backward paths.…”
Section: Multi-level Feature Fusion and Mask Predictionmentioning
confidence: 99%
“…where − → H and ← − H denote the last hidden states of two directions, b is the bias term, and V out ∈ R h×w×D o is the integrated multi-modal features. Finally, the same decoder layers and binary cross-entropy loss as previous works [3,10] are adopted to predict the segmentation mask and optimize the network, respectively.…”
Section: Multi-level Feature Fusion and Mask Predictionmentioning
confidence: 99%
See 1 more Smart Citation