2020
DOI: 10.1016/j.patrec.2020.02.006
|View full text |Cite
|
Sign up to set email alerts
|

CMIR-NET : A deep learning based model for cross-modal retrieval in remote sensing

Abstract: We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) crossmodal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. These multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
44
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 57 publications
(44 citation statements)
references
References 18 publications
(33 reference statements)
0
44
0
Order By: Relevance
“…By definition, the task of image fusion aims at synergistically combining images from different related modalities to generate a merged representation of the information present in the images, improving visual inference performance over the individual images. Growing interest from the multimedia community is reflected in various works like [21] where audio-visual crossmodal representation learning was proposed, in [22] where RGB-depth multimodal features were fused for scene classification and in shared cross modal image retrieval [23]. It is also an emerging topic in medical image classification.…”
Section: Related Workmentioning
confidence: 99%
“…By definition, the task of image fusion aims at synergistically combining images from different related modalities to generate a merged representation of the information present in the images, improving visual inference performance over the individual images. Growing interest from the multimedia community is reflected in various works like [21] where audio-visual crossmodal representation learning was proposed, in [22] where RGB-depth multimodal features were fused for scene classification and in shared cross modal image retrieval [23]. It is also an emerging topic in medical image classification.…”
Section: Related Workmentioning
confidence: 99%
“…In the same way, multi-source data can be fused using the CMIR-NET (e.g.) that learns from two separate, but labelled data sets [62]. Compared to other techniques, a high performance can only be achieved with a very large amount of data.…”
Section: Multi-sar System With Image Fusionmentioning
confidence: 99%
“…Feature extraction with deep learning-based methods is found in several applications with remote sensing imagery [10][11][12][13][14][15][16][17][18]. These deep networks are built with different types of architectures that follow a hierarchical type of learning.…”
Section: Introductionmentioning
confidence: 99%