Hypergraph-Enhanced Textual-Visual Matching Network for Cross-Modal Remote Sensing Image Retrieval via Dynamic Hypergraph Learning

Yao, Fanglong; Sun, Xian; Liu, Nayu; Tian, Changyuan; Xu, Lei; Hu, Leiyi; Ding, Chenxu

doi:10.1109/jstars.2022.3226325

Cited by 14 publications

(5 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(Yin et al 2022) extracts features of historical context content to provide guidance for dynamic hypergraph construction. (Yao et al 2022) introduces the attention mechanism to achieve alternate update of vertices and hyperedges. Despite the desirable success of HGNNs, domain knowledge is highly required to manually design the architecture.…”

Section: Related Work Hypergraph Neural Networkmentioning

confidence: 99%

Hypergraph Neural Architecture Search

Lin,

Peng,

et al. 2024

AAAI

View full text Add to dashboard Cite

In recent years, Hypergraph Neural Networks (HGNNs) have achieved considerable success by manually designing architectures, which are capable of extracting effective patterns with high-order interactions from non-Euclidean data. However, such mechanism is extremely inefficient, demanding tremendous human efforts to tune diverse model parameters. In this paper, we propose a novel Hypergraph Neural Architecture Search (HyperNAS) to automatically design the optimal HGNNs. The proposed model constructs a search space suitable for hypergraphs, and derives hypergraph architectures through differentiable search strategies. A hypergraph structure-aware distance criterion is introduced as a guideline for obtaining an optimal hypergraph architecture via the leave-one-out method. Experimental results for node classification on benchmark Cora, Citeseer, Pubmed citation networks and hypergraph datasets show that HyperNAS outperforms existing HGNNs models and graph NAS methods.

show abstract

Section: Related Work Hypergraph Neural Networkmentioning

confidence: 99%

Hypergraph Neural Architecture Search

Lin,

Peng,

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…For performance comparison, we selected three recent common feature space models, namely AMFMN [18] and its three variants (AMFMN-soft, AMFMN-fusion, AMFMN-sim), HyperMatch [30], and CMFM-Net [24], as baseline models based on RSICD. The reasons for selecting them are as follows: Firstly, they both belong to the common feature space approach and address the multi-scale problem.…”

Section: Basic Experiments 431 Basic Experiments On Rsicdmentioning

confidence: 99%

“…Furthermore, several studies address the challenges posed by the multi-scale features of remote sensing images, as the differences in target scales make the semantic alignment of cross-modal features more complex [30]. As documented in [18,24,30,31], two main challenges arise in cross-modal retrieval due to multiple scales: (1) effectively utilizing the diverse scale features of an image, including emphasizing salient features and preserving information related to small targets; (2) modeling the intricate relationships among multiscale targets. To address these challenges, Yuan et al [18] introduced a multi-scale vision self-attention module that comprehensively investigates multi-scale information and eliminates redundant features by merging cross-layer features of a convolutional neural network (CNN).…”

Section: Introductionmentioning

confidence: 99%

“…Additionally, Wang et al [31] designed a lightweight sub-module for multi-scale feature exploration that utilizes parallel networks with distinct receptive fields to extract and integrate multi-scale features. Yao et al [30] focused on modeling the relationships among multi-scale targets by constructing hypergraph networks at different levels to depict the connections between objects of varying scales.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval

Zheng,

Wang,

Wang

et al. 2023

Sensors

View full text Add to dashboard Cite

Due to the swift growth in the scale of remote sensing imagery, scholars have progressively directed their attention towards achieving efficient and adaptable cross-modal retrieval for remote sensing images. They have also steadily tackled the distinctive challenge posed by the multi-scale attributes of these images. However, existing studies primarily concentrate on the characterization of these features, neglecting the comprehensive investigation of the complex relationship between multi-scale targets and the semantic alignment of these targets with text. To address this issue, this study introduces a fine-grained semantic alignment method that adequately aggregates multi-scale information (referred to as FAAMI). The proposed approach comprises multiple stages. Initially, we employ a computing-friendly cross-layer feature connection method to construct a multi-scale feature representation of an image. Subsequently, we devise an efficient feature consistency enhancement module to rectify the incongruous semantic discrimination observed in cross-layer features. Finally, a shallow cross-attention network is employed to capture the fine-grained semantic relationship between multiple-scale image regions and the corresponding words in the text. Extensive experiments were conducted using two datasets: RSICD and RSITMD. The results demonstrate that the performance of FAAMI surpasses that of recently proposed advanced models in the same domain, with significant improvements observed in R@K and other evaluation metrics. Specifically, the mR values achieved by FAAMI are 23.18% and 35.99% for the two datasets, respectively.

show abstract

“…Some alternative methods have also served as visual feature encoders. For instance, References [14,15] employed hypergraph neural networks to construct visual encoders. Reference [16] applied the concept of image neural networks and built text and remote sensing image modules, achieving an interactive fusion of image and text features.…”

Section: Introductionmentioning

confidence: 99%

An Enhanced Feature Extraction Framework for Cross-Modal Image–Text Retrieval

Zhang,

Wang,

Zheng

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

In general, remote sensing images depict intricate scenes. In cross-modal retrieval tasks involving remote sensing images, the accompanying text includes numerus information with an emphasis on mainly large objects due to higher attention, and the features from small targets are often omitted naturally. While the conventional vision transformer (ViT) method adeptly captures information regarding large global targets, its capability to extract features of small targets is limited. This limitation stems from the constrained receptive field in ViT’s self-attention layer, which hinders the extraction of information pertaining to small targets due to interference from large targets. To address this concern, this study introduces a patch classification framework based on feature similarity, which establishes distinct receptive fields in the feature space to mitigate interference from large targets on small ones, thereby enhancing the ability of traditional ViT to extract features from small targets. We conducted evaluation experiments on two popular datasets—the Remote Sensing Image–Text Match Dataset (RSITMD) and the Remote Sensing Image Captioning Dataset (RSICD)—resulting in mR indices of 35.6% and 19.47%, respectively. The proposed approach contributes to improving the detection accuracy of small targets and can be applied to more complex image–text retrieval tasks involving multi-scale ground objects.

show abstract

Hypergraph-Enhanced Textual-Visual Matching Network for Cross-Modal Remote Sensing Image Retrieval via Dynamic Hypergraph Learning

Abstract: Cross-modal remote sensing (RS) image retrieval aims to retrieve RS images using other modalities (e.g., text) and vice versa. The relationship between objects in RS image

Cited by 14 publications

References 42 publications

Hypergraph Neural Architecture Search

Hypergraph Neural Architecture Search

A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval

An Enhanced Feature Extraction Framework for Cross-Modal Image–Text Retrieval

Contact Info

Product

Resources

About