2022
DOI: 10.1109/tgrs.2021.3078451
|View full text |Cite
|
Sign up to set email alerts
|

Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval

Abstract: Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query. However, traditional methods ignore the characteristics of multiscale and redundant targets in RS image, leading to the degradation of retrieval accuracy. To cope with the problem of multiscale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN). Our model adapts to multiscale f… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
67
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 80 publications
(68 citation statements)
references
References 55 publications
1
67
0
Order By: Relevance
“…To obtain a more robust embedding, Abdullah et al [8] proposed a deep bidirectional triplet network to learn the joint encoding between multiple modalities and utilized an averaging fusion strategy to fuse the features of multiple text-image pairs. To solve the problem of multi-scale scarcity and target redundancy in RSCITR, Yuan et al [14] proposed an asymmetric multimodal feature matching network (AMFMN) and contributed a fine-grained RS image-text dataset for this task. By exploring the potential correspondence between RS images and text, Cheng et al [9] proposed a semantic alignment module to get a more discriminative feature representation.…”
Section: A Rs Cross-modal Text-image Retrieval (Rsctir)mentioning
confidence: 99%
See 4 more Smart Citations
“…To obtain a more robust embedding, Abdullah et al [8] proposed a deep bidirectional triplet network to learn the joint encoding between multiple modalities and utilized an averaging fusion strategy to fuse the features of multiple text-image pairs. To solve the problem of multi-scale scarcity and target redundancy in RSCITR, Yuan et al [14] proposed an asymmetric multimodal feature matching network (AMFMN) and contributed a fine-grained RS image-text dataset for this task. By exploring the potential correspondence between RS images and text, Cheng et al [9] proposed a semantic alignment module to get a more discriminative feature representation.…”
Section: A Rs Cross-modal Text-image Retrieval (Rsctir)mentioning
confidence: 99%
“…1) Visual Representation: In the traditional method, CNNs are often used to construct the W v so as to embed the images [41]. Although these methods directly map the global information into the high-dimensional space, they ignore the distinction between significant objects and redundant information due to the complexity of RS image [14]. In addition, the relationship between objects in RS images can not be well represented by global features, which may lead to degradation of retrieval performance.…”
Section: A Formulationmentioning
confidence: 99%
See 3 more Smart Citations