Deep Label Feature Fusion Hashing for Cross-Modal Retrieval

Ren, Dongxiao; Xu, Wei; Wang, Zhonghua; Sun, Qinxiu

doi:10.1109/access.2022.3208147

Cited by 3 publications

(2 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, existing cross-modal hashing models often overlook the distinctive semantic information associated with each data point's label. To leverage the unique semantic information embedded in labels and capture the semantic correlation between different modalities of data, Ren et al [60] introduced Deep Label Feature Fusion Hashing (DLFFH). They construct corresponding label networks within different modality networks to facilitate feature fusion, aiming to embed semantic label information into data features and thereby enhance the performance of cross-modal retrieval.…”

Section: Supervised Learningmentioning

confidence: 99%

A review of cross-modal retrieval for image-text

Xia,

Yang,

et al. 2024

Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023)

View full text Add to dashboard Cite

With the rapid advancement of Internet technology and the widespread adoption of smart devices, there has been a substantial increase in multimodal data that conveys identical semantics but in diverse coding formats. To foster the advancement of social intelligence, scholars are increasingly investigating the semantic correlations among multimodal data, which represents a current research focal point. The primary objective of cross-modal accurately compute the dissimilar modalities and efficiently retrieve relevant data from other modalities. The objective of this article is to provide comprehensive overview of the advancements in cross-modal retrieval research. First, it presents a conceptual framework and problem formulation for cross-modal retrieval elucidating, the multimodal nature of image and text cross-modal retrieval. Secondly, it delves into semantic representation learning-based approaches for computing imagetext cross-modal similarity and hash-based methods for facilitating cross-modal retrieval. Furthermore, a comparative analysis is conducted on widely adopted evaluation metrics for current cross-modal retrieval techniques, accompanied by outlook on future research directions.

show abstract

Section: Supervised Learningmentioning

confidence: 99%

A review of cross-modal retrieval for image-text

Xia,

Yang,

et al. 2024

Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023)

View full text Add to dashboard Cite

show abstract

“…For example, deep neural networks can automatically capture the data features and hash functions in Refs. [15][16][17][18][19][20].…”

Section: Introductionmentioning

confidence: 99%

Cross-modal retrieval based on multi-dimensional feature fusion hashing

Ren,

2024

Front. Phys.

Self Cite

View full text Add to dashboard Cite

Along with the continuous breakthrough and popularization of information network technology, multi-modal data, including texts, images, videos, and audio, is growing rapidly. We can retrieve different modal data to meet our needs, so cross-modal retrieval has important theoretical significance and application value. In addition, because the data of different modalities can be mutually retrieved by mapping them to a unified Hamming space, hash codes have been extensively used in the cross-modal retrieval field. However, existing cross-modal hashing models generate hash codes based on single-dimension data features, ignoring the semantic correlation between data features in different dimensions. Therefore, an innovative cross-modal retrieval method using Multi-Dimensional Feature Fusion Hashing (MDFFH) is proposed. To better get the image’s multi-dimensional semantic features, a convolutional neural network, and Vision Transformer are combined to construct an image multi-dimensional fusion module. Similarly, we apply the multi-dimensional text fusion module to the text modality to obtain the text’s multi-dimensional semantic features. These two modules can effectively integrate the semantic features of data in different dimensions through feature fusion, making the generated hash code more representative and semantic. Extensive experiments and corresponding analysis results on two datasets indicate that MDFFH’s performance outdoes other baseline models.

show abstract

Enhancing Stock Price Prediction with Deep Cross-Modal Information Fusion Network

Mandal,

Kler,

Tiwari

et al. 2024

Fluct. Noise Lett.

View full text Add to dashboard Cite

Stock price prediction is considered a classic and challenging task, with the potential to aid traders in making more profitable trading decisions. Significant improvements in stock price prediction methods based on deep learning have been observed in recent years. However, most existing methods are reliant solely on historical stock price data for predictions, resulting in the inability to capture market dynamics beyond price indicators, thus limiting their performance to some extent. Therefore, combining social media text with historical stock price information has proposed a novel stock price prediction method, known as the Deep Cross-Modal Information Fusion Network (DCIFNet). The process is initiated by DCIFNet, which employs temporal convolution processes to encode stock prices and Twitter content. This ensures that each element has sufficient information about its surrounding components. Following this, the outcomes are inputted into a cross-modal fusion structure based on transformers to enhance the integration of crucial information from stock prices and Twitter content. Lastly, a multi-graph convolution attention network is introduced to depict the relationships between different stocks from diverse perspectives. This facilitates the more effective capturing of industry affiliations, Wikipedia references, and associated relationships among linked stocks, ultimately leading to an enhancement in stock price prediction accuracy. Trend prediction and simulated trading experiments are conducted on high-frequency trading datasets spanning nine different industries. Comparative assessments with the Multi-Attention Network for Stock Prediction (MANGSF) method, as well as ablation experiments, confirm the effectiveness of the DCIFNet approach, resulting in an accuracy rate of 0.6309, a marked improvement compared to representative methods in the field.

show abstract

Deep Label Feature Fusion Hashing for Cross-Modal Retrieval

Cited by 3 publications

References 44 publications

A review of cross-modal retrieval for image-text

A review of cross-modal retrieval for image-text

Cross-modal retrieval based on multi-dimensional feature fusion hashing

Enhancing Stock Price Prediction with Deep Cross-Modal Information Fusion Network

Contact Info

Product

Resources

About