Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3548187
|View full text |Cite
|
Sign up to set email alerts
|

Differentiable Cross-modal Hashing via Multimodal Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(7 citation statements)
references
References 25 publications
0
7
0
Order By: Relevance
“…Since binary optimization is hard, most pair-wise methods employ a continuous relaxation strategy in training that will inevitably lead to quantization errors. To solve this problem, by introducing a selection mechanism, Tu et al proposed a novel Differentiable Cross-modal Hashing via Multimodal Transformers (DCHMT) algorithm to make binary optimization become differentiable [51]. Recently, Zhang et al introduced a Modality-Invariant Asymmetric Networks (MIAN) network to study asymmetric intra-modal and inter-modal similarity preservation methods in a probabilistic modality alignment framework [52].…”
Section: B Deep Supervised Cross-modal Hashingmentioning
confidence: 99%
See 1 more Smart Citation
“…Since binary optimization is hard, most pair-wise methods employ a continuous relaxation strategy in training that will inevitably lead to quantization errors. To solve this problem, by introducing a selection mechanism, Tu et al proposed a novel Differentiable Cross-modal Hashing via Multimodal Transformers (DCHMT) algorithm to make binary optimization become differentiable [51]. Recently, Zhang et al introduced a Modality-Invariant Asymmetric Networks (MIAN) network to study asymmetric intra-modal and inter-modal similarity preservation methods in a probabilistic modality alignment framework [52].…”
Section: B Deep Supervised Cross-modal Hashingmentioning
confidence: 99%
“…Meanwhile, the Bag of Words (Bow) model is usually employed to denote the text [56], and then the MLP model is employed to extract the potential semantic information of the text. However, due to the sparsity nature of Bag-of-Words, previous studies have demonstrated that the above approach may ignore some valuable text information [11], [28], [51], leading to the generation of sub-optimal hash codes. Therefore, inspired by [32], the transformer encoders are introduced to achieve our proposed Image-Network and Text-Network, obtaining potential feature representations from image modalities and text modalities, respectively.…”
Section: Feature Learningmentioning
confidence: 99%
“…6) Transformer methods: Transformer methods [160], [161] leverage Transformer encoders to effectively capture fine-grained semantic information and introduce label information for supervision to model complex relationships within multi-modal data. For instance, Differentiable Cross-modal Hashing via Multi-modal Transformers (DCHMT) [160] constructs a multi-modal transformer to capture detailed crossmodal semantic information and introduces a micro-hashing module for mapping modal representations into hash codes.…”
Section: Supervised Cross-modal Hashing Retrievalmentioning
confidence: 99%
“…Compared with the shallow hash method of manually extracting features, the hash method based on deep learning greatly improves the distinctiveness and effectiveness of the extracted features.The Deep Cross-modal Hashing (DCMH) [8] first combined deep learning with hash learning, and proposed an end-to-end learning framework that integrates feature learning and hash learning. After this, many hash methods based on deep learning have been studied successively, such as [9,[14][15][16][17][18]. Most of the existing hash methods use the co-occurrence information of input image text pairs or artificially annotated semantic tags.…”
Section: Hash Learningmentioning
confidence: 99%