Deep Unified Cross-Modality Hashing by Pairwise Data Alignment

Wang, Yimu; Xue, Bo; Cheng, Quan; Chen, Yuhui; Zhang, Lijun

doi:10.24963/ijcai.2021/156

Cited by 11 publications

(10 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To bridge modality gap, JDSH [8] exploits a joint-modal similarity matrix, while DUCMH [9] relies on data alignment and image-text data pairs. DGCPN [10] explores intrinsic semantic relationships with graph-neighbor coherence to avoid suboptimal retrieval Hamming space. With introducing knowledge distillation scheme, KDCMH [11] trains an unsupervised method as the teacher model used to provide distillation information to guide supervised method.…”

Section: A Non-continuous Cross-modal Hashingmentioning

confidence: 99%

“…Hash lookup, a widely used retrieval protocol in hashingbased retrieval [10], considers the retrieved instance whose Hamming distance to the query is less than Hamming radius as a positive sample. When measuring the precision of hash lookup protocol, we hope that a good hashing method can retrieve as many positive samples as possible, i.e., when the query instance o i and the retrieved instance o j are true relevant, the probability of being judged as relevant should be as large as possible, denoted as:…”

Section: Multi-label Semantic Similaritymentioning

confidence: 99%

See 1 more Smart Citation

Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval

Zeng

Zheng

et al. 2022

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Hashing methods have made significant progress in cross-modal retrieval tasks with fast query speed and low storage cost. Among them, deep learning-based hashing achieves better performance on large-scale data due to its excellent extraction and representation ability for nonlinear heterogeneous features. However, there are still two main challenges in catastrophic forgetting when data with new categories arrive continuously, and time-consuming for non-continuous hashing retrieval to retrain for updating. To this end, we, in this paper, propose a novel deep lifelong cross-modal hashing to achieve lifelong hashing retrieval instead of re-training hash function repeatedly when new data arrive. Specifically, we design lifelong learning strategy to update hash functions by directly training the incremental data instead of retraining new hash functions using all the accumulated data, which significantly reduce training time. Then, we propose lifelong hashing loss to enable original hash codes participate in lifelong learning but remain invariant, and further preserve the similarity and dis-similarity among original and incremental hash codes to maintain performance. Additionally, considering distribution heterogeneity when new data arriving continuously, we introduce multi-label semantic similarity to supervise hash learning, and it has been proven that the similarity improves performance with detailed analysis. Experimental results on benchmark datasets show that the proposed methods achieves comparative performance comparing with recent state-of-the-art cross-modal hashing methods, and it yields substantial average increments over 20% in retrieval accuracy and almost reduces over 80% training time when new data arrives continuously.

show abstract

Section: A Non-continuous Cross-modal Hashingmentioning

confidence: 99%

Section: Multi-label Semantic Similaritymentioning

confidence: 99%

Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval

Zeng

Zheng

et al. 2022

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

show abstract

“…Hashing (Wang et al 2018a;Gao et al 2023;Kou et al 2022;Wang et al 2021) has gained increasing interests in many large-scale applications. Its primary goal is to encode • We propose a novel distributed manifold hashing (DMH) method for compact image set representation.…”

Section: Introductionmentioning

confidence: 99%

“…Hashing Hashing (Wang et al 2018a,a;Gao et al 2023;Kou et al 2022;Wang et al 2021) aims to learn compact hash code in Hamming space while preserving similarity, which can be particularly useful for efficient largescale search. Many existing hashing methods (Gong et al 2013;Shen et al 2015) employ machine learning technique to learn hash function and hash code.…”

Section: Introductionmentioning

confidence: 99%

Distributed Manifold Hashing for Image Set Classification and Retrieval

Shen,

Song,

Yuan

et al. 2024

AAAI

View full text Add to dashboard Cite

Conventional image set methods typically learn from image sets stored in one location. However, in real-world applications, image sets are often distributed or collected across different positions. Learning from such distributed image sets presents a challenge that has not been studied thus far. Moreover, efficiency is seldom addressed in large-scale image set applications. To fulfill these gaps, this paper proposes Distributed Manifold Hashing (DMH), which models distributed image sets as a connected graph. DMH employs Riemannian manifold to effectively represent each image set and further suggests learning hash code for each image set to achieve efficient computation and storage. DMH is formally formulated as a distributed learning problem with local consistency constraint on global variables among neighbor nodes, and can be optimized in parallel. Extensive experiments on three benchmark datasets demonstrate that DMH achieves highly competitive accuracies in a distributed setting and provides faster classification and retrieval than state-of-the-arts.

show abstract

“…As shown in Figure 1(a), only a short video segment semantically matches the query, while most of the video contents are queryirrelevant. Clearly, TSG tries to break through the barrier between computer vision and natural language processing techniques for more challenging cross-modal grounding (Li et al, ,a, 2022Wang and Shi, 2023;Wang et al, 2021aWang et al, , 2020c.…”

Section: Introductionmentioning

confidence: 99%

Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding

Fang,

Liu,

Fang

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

This paper addresses the task of temporal sentence grounding (TSG). Although many respectable works have made decent achievements in this important topic, they severely rely on massive expensive video-query paired annotations, which require a tremendous amount of human effort to collect in real-world applications. To this end, in this paper, we target a more practical but challenging TSG setting: unsupervised temporal sentence grounding, where both paired video-query and segment boundary annotations are unavailable during the network training. Considering that some other cross-modal tasks provide many easily available yet cheap labels, we tend to collect and transfer their simple cross-modal alignment knowledge into our complex scenarios: 1) We first explore the entity-aware objectguided appearance knowledge from the paired Image-Noun task, and adapt them into each independent video frame; 2) Then, we extract the event-aware action representation from the paired Video-Verb task, and further refine the action representation into more practical but complicated real-world cases by a newly proposed copy-paste approach; 3) By modulating and transferring both appearance and action knowledge into our challenging unsupervised task, our model can directly utilize this general knowledge to correlate videos and queries, and accurately retrieve the relevant segment without training. Extensive experiments on two challenging datasets (ActivityNet Captions and Charades-STA) show our effectiveness, outperforming existing unsupervised methods and even competitively beating supervised works.

show abstract

Deep Unified Cross-Modality Hashing by Pairwise Data Alignment

Cited by 11 publications

References 1 publication

Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval

Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval

Distributed Manifold Hashing for Image Set Classification and Retrieval

Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding

Contact Info

Product

Resources

About