Deep Multi-Semantic Fusion-Based Cross-Modal Hashing

Zhu, Xinghui; Cai, Liewu; Zou, Zhuoyang; Zhu, Lei

doi:10.3390/math10030430

Cited by 3 publications

(4 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, there was also a Samsung 980 Pro2T solid-state drive. In this paper, eight advanced cross-modal retrieval methods were selected to compare with Tri-CMH, namely CMFH [19], CCA-ITQ [18], SCM [7], SePH [23], DCMH [15], TDH [27], DLFH [2], and DMSFH [25]. Among them, the first four algorithms were based on shallow frameworks, and the last four were based on deep learning.…”

Section: Methodsmentioning

confidence: 99%

“…The hash function can be learned in conjunction with semantic tagging information during feature extraction, thereby reducing semantic differences between modalities to improve the performance of cross-modal retrieval [20]. Typical supervised cross-modal hashing methods include the discrete latent factor hashing (DLFH) crossmodal method [2], the semantic correlation maximization (SCM) method [7], the deep cross-modal hashing (DCMH) method [15], the generalized semantic preserving hashing for n-label cross-modal retrieval [21], the multimodal latent binary embedding (MLBE) method [22], the semantic preservation hashing (SePH) method [23], the cross-view hashing (CVH) method [24], the deep multi-semantic fusion-based cross-modal hashing (DMSFH) method [25], the deep visual-semantic hashing (DVSH) method [26] and the triplet-based deep hashing (TDH) method [27]. [18] is a typical correlation analysis iterative quantification methodology proposed by Gong Y. et al in 2012. It is an unsupervised cross-modal hash retrieval method that provides multivariate statistics by evaluating the similarity of two sets of variables.…”

Section: Related Workmentioning

confidence: 99%

“…DMSFH [25] proposed in 2022 by Zhu X et al, consists of deep multi-semantic fusionbased cross-modal hashing, which uses a multi-label semantic fusion method to improve cross-modal consistent semantic discrimination learning. Moreover, a graph regularization method combines inter-modal and intra-modal pairwise loss to preserve the nearest neighbor relationship between data in the Hamming subspace.…”

Section: Cca-itqmentioning

confidence: 99%

See 2 more Smart Citations

A Cross-Modal Hash Retrieval Method with Fused Triples

Li,

Mei,

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

Due to the fast retrieval speed and low storage cost, cross-modal hashing has become the primary method for cross-modal retrieval. Since the emergence of deep cross-modal hashing methods, cross-modal retrieval significantly improved. However, the existing cross-modal hash retrieval methods still need to effectively utilize the dataset’s supervisory information and the lack of similarity expression ability. This means that the label information needs to be maximized, and the potential semantic relationship between two modalities cannot be fully explored, thus affecting the judgment of semantic similarity between two modalities. To address these problems, this paper proposes Tri-CMH, a cross-modal hash retrieval method with fused triples, which is an end-to-end modeling framework consisting of two parts: feature extraction and hash learning. Firstly, the multi-modal data are preprocessing into the form of triple groups. The data supervision matrix is constructed so that the samples with labels and their meanings are aggregated together. In contrast, the samples with labels and their opposite meanings are separated, thus avoiding the problem of the under-utilization of supervisory information in the data set and achieving the effect of efficiently utilizing the global supervisory information. Meanwhile, the loss function of the hash learning part is optimized by considering the Hamming distance loss, single-modality internal loss, cross-modality loss, and quantization loss to explicitly constrain semantically similar hash codes and semantically dissimilar hash codes and to improve the model’s ability to judge cross-modality semantic similarity. The method is trained and tested on the IAPR-TC12, MIRFLICKR-25K, and NUS-WIDE datasets, and the experimental evaluation criteria are mAP and PR curve, and the experimental results show the effectiveness and practicality of the method.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Cca-itqmentioning

confidence: 99%

See 1 more Smart Citation

A Cross-Modal Hash Retrieval Method with Fused Triples

Li,

Mei,

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Using cosine distance and Euclidean distance, the same measurement index can accurately reflect the similarity between different modal data in Deep Semantic Cross-Modal Hashing Based on Graph Similarity of Modal-Specific (DCMHGMS) [36]. The distance between similar data can be reduced by constructing ranking alignment loss to unearth the semantic structure between different modal data in Deep Rank Cross-modal Hashing (DRCH) [37,38]. Semantic weight factors are constructed to guide the optimization of the loss function and obtain better retrieval performance in Multiple Deep neural networks with Multiple labels for Cross-modal Hashing (MDMCH) [39].…”

Section: Related Workmentioning

confidence: 99%

Cross-modal retrieval based on multi-dimensional feature fusion hashing

Ren,

2024

Front. Phys.

View full text Add to dashboard Cite

Along with the continuous breakthrough and popularization of information network technology, multi-modal data, including texts, images, videos, and audio, is growing rapidly. We can retrieve different modal data to meet our needs, so cross-modal retrieval has important theoretical significance and application value. In addition, because the data of different modalities can be mutually retrieved by mapping them to a unified Hamming space, hash codes have been extensively used in the cross-modal retrieval field. However, existing cross-modal hashing models generate hash codes based on single-dimension data features, ignoring the semantic correlation between data features in different dimensions. Therefore, an innovative cross-modal retrieval method using Multi-Dimensional Feature Fusion Hashing (MDFFH) is proposed. To better get the image’s multi-dimensional semantic features, a convolutional neural network, and Vision Transformer are combined to construct an image multi-dimensional fusion module. Similarly, we apply the multi-dimensional text fusion module to the text modality to obtain the text’s multi-dimensional semantic features. These two modules can effectively integrate the semantic features of data in different dimensions through feature fusion, making the generated hash code more representative and semantic. Extensive experiments and corresponding analysis results on two datasets indicate that MDFFH’s performance outdoes other baseline models.

show abstract