2022
DOI: 10.1109/tgrs.2022.3163706
|View full text |Cite
|
Sign up to set email alerts
|

Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information

Abstract: Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. However, current RSCTIR methods mainly focus on global features of RS images, which leads to the neglect of local features that reflect target relationships and saliency. In this article, we first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level informa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
28
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 70 publications
(28 citation statements)
references
References 53 publications
0
28
0
Order By: Relevance
“…Next, to improve the inference speed of SeLo tasks, Yuan et al [12] designed a lightweight RSCTIR model, and improved the model from the perspective of knowledge distillation and negative sampling. In [13], the authors qualitatively compare the visualization results of the method and others, thereby verifying the effect of the proposed fusion module. Although RS multimodal semantic localization is a recently emerging task, above works only judge the task from a qualitative analysis perspective, which lacks discriminative quantitative metrics and unified baselines.…”
Section: B Multi-modal Semantic Localizationmentioning
confidence: 76%
See 3 more Smart Citations
“…Next, to improve the inference speed of SeLo tasks, Yuan et al [12] designed a lightweight RSCTIR model, and improved the model from the perspective of knowledge distillation and negative sampling. In [13], the authors qualitatively compare the visualization results of the method and others, thereby verifying the effect of the proposed fusion module. Although RS multimodal semantic localization is a recently emerging task, above works only judge the task from a qualitative analysis perspective, which lacks discriminative quantitative metrics and unified baselines.…”
Section: B Multi-modal Semantic Localizationmentioning
confidence: 76%
“…Yuan et al [12] proposed a lightweight text-image retrieval model, which realized fast RS cross-modal retrieval, and improved the retrieval performance by using knowledge extraction and contrast learning. Further, Yuan et al [13] added the denoised detection information to the RS image representation, which greatly improved the retrieval accuracy. The single-stage calculation of embedded-based RSCTIR greatly reduces the loss of information transformation and becomes the main cross-modal retrieval method in recent years.…”
Section: A Remote Sensing Cross-modal Retrievalmentioning
confidence: 99%
See 2 more Smart Citations
“…Yuan et al [44] constructed a fine-grained and more challenging remote sensing image-text matching dataset (RSITMD) and proposed a new asymmetric multimodal feature matching network (AMFMN) for the multi-scale scarcity and target redundancy problems in RS multimodal retrieval. Later, Yuan et al [45] proposed a framework for remote sensing image-text matching based on global and local information and designed a multi-level information dynamic fusion module to integrate features at different levels effectively.…”
Section: B Cross-modal Remote Sensing Image Retrieval (Cmrsir)mentioning
confidence: 99%