2020
DOI: 10.3390/app10248931
|View full text |Cite
|
Sign up to set email alerts
|

Deep Unsupervised Embedding for Remote Sensing Image Retrieval Using Textual Cues

Abstract: Compared to image-image retrieval, text-image retrieval has been less investigated in the remote sensing community, possibly because of the complexity of appropriately tying textual data to respective visual representations. Moreover, a single image may be described via multiple sentences according to the perception of the human labeler and the structure/body of the language they use, which magnifies the complexity even further. In this paper, we propose an unsupervised method for text-image retrieval in remot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 45 publications
(55 reference statements)
0
13
0
Order By: Relevance
“…1) Image-text retrieval method: Image-text retrieval method [15], [16] tries to search desired images (voices) according to the corresponding text description (images). Abdullah et al [15] proposed a Deep Bidirectional Triplet Network (DBTN) to fuse the multiple text description with an average fusion strategy.…”
Section: B Remote Sensing Cross-modal Retrievalmentioning
confidence: 99%
See 2 more Smart Citations
“…1) Image-text retrieval method: Image-text retrieval method [15], [16] tries to search desired images (voices) according to the corresponding text description (images). Abdullah et al [15] proposed a Deep Bidirectional Triplet Network (DBTN) to fuse the multiple text description with an average fusion strategy.…”
Section: B Remote Sensing Cross-modal Retrievalmentioning
confidence: 99%
“…Abdullah et al [15] proposed a Deep Bidirectional Triplet Network (DBTN) to fuse the multiple text description with an average fusion strategy. Rahhal et al [16] proposed a visual Big Transfer (BiT) Models and a Bidirectional Long Short-Term Memory (Bi-LSTM) network to learn image and text features, respectively. Cheng et al [32] proposed a Semantic Alignment Module (SAM) to enhance the latent correlation between remote sensing images and text features.…”
Section: B Remote Sensing Cross-modal Retrievalmentioning
confidence: 99%
See 1 more Smart Citation
“…Motivated by this, RS image retrieval has been explored using queries of different modalities such as images from different sources or sensors [11], [12], [13], [14], sketches [15], [16], speech [17], [18], [19], [20], [21], and text [22], [23], [24], [25], [26], [27]. Among these modalities, textual descriptions represent the most intuitive way of communicating with machines.…”
Section: Introductionmentioning
confidence: 99%
“…In the literature, only few works have been developed for text-image retrieval [22], [23], [24], [25], [26]. All of these works have been designed to allow English as the primary language of the query.…”
Section: Introductionmentioning
confidence: 99%