2023
DOI: 10.1109/jstars.2022.3226325
|View full text |Cite
|
Sign up to set email alerts
|

Hypergraph-Enhanced Textual-Visual Matching Network for Cross-Modal Remote Sensing Image Retrieval via Dynamic Hypergraph Learning

Abstract: Cross-modal remote sensing (RS) image retrieval aims to retrieve RS images using other modalities (e.g., text) and vice versa. The relationship between objects in RS image

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 42 publications
0
5
0
Order By: Relevance
“…(Yin et al 2022) extracts features of historical context content to provide guidance for dynamic hypergraph construction. (Yao et al 2022) introduces the attention mechanism to achieve alternate update of vertices and hyperedges. Despite the desirable success of HGNNs, domain knowledge is highly required to manually design the architecture.…”
Section: Related Work Hypergraph Neural Networkmentioning
confidence: 99%
“…(Yin et al 2022) extracts features of historical context content to provide guidance for dynamic hypergraph construction. (Yao et al 2022) introduces the attention mechanism to achieve alternate update of vertices and hyperedges. Despite the desirable success of HGNNs, domain knowledge is highly required to manually design the architecture.…”
Section: Related Work Hypergraph Neural Networkmentioning
confidence: 99%
“…For performance comparison, we selected three recent common feature space models, namely AMFMN [18] and its three variants (AMFMN-soft, AMFMN-fusion, AMFMN-sim), HyperMatch [30], and CMFM-Net [24], as baseline models based on RSICD. The reasons for selecting them are as follows: Firstly, they both belong to the common feature space approach and address the multi-scale problem.…”
Section: Basic Experiments 431 Basic Experiments On Rsicdmentioning
confidence: 99%
“…Furthermore, several studies address the challenges posed by the multi-scale features of remote sensing images, as the differences in target scales make the semantic alignment of cross-modal features more complex [30]. As documented in [18,24,30,31], two main challenges arise in cross-modal retrieval due to multiple scales: (1) effectively utilizing the diverse scale features of an image, including emphasizing salient features and preserving information related to small targets; (2) modeling the intricate relationships among multiscale targets. To address these challenges, Yuan et al [18] introduced a multi-scale vision self-attention module that comprehensively investigates multi-scale information and eliminates redundant features by merging cross-layer features of a convolutional neural network (CNN).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Some alternative methods have also served as visual feature encoders. For instance, References [14,15] employed hypergraph neural networks to construct visual encoders. Reference [16] applied the concept of image neural networks and built text and remote sensing image modules, achieving an interactive fusion of image and text features.…”
Section: Introductionmentioning
confidence: 99%