2019 IEEE International Conference on Multimedia and Expo (ICME) 2019
DOI: 10.1109/icme.2019.00089
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities

Abstract: Cross-modal retrieval aims to retrieve relevant data across different modalities (e.g., texts vs. images). The common strategy is to apply element-wise constraints between manually labeled pair-wise items to guide the generators to learn the semantic relationships between the modalities, so that the similar items can be projected close to each other in the common representation subspace. However, such constraints often fail to preserve the semantic structure between unpaired but semantically similar items (e.g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(17 citation statements)
references
References 17 publications
0
17
0
Order By: Relevance
“…SLSAE was compared with 10 methods belonging to two categories, traditional statistical methods and DNN-based methods. Traditional cross-modal retrieval methods include CCA [8], LCFS [23] and JRL [24], and DNN-based methods include Corr-AE [13], CMDN [11], MHTN [25], Deep-SM [12], ACMR [5], MASLN [26] and CMST [38].…”
Section: Comparison With Existing Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…SLSAE was compared with 10 methods belonging to two categories, traditional statistical methods and DNN-based methods. Traditional cross-modal retrieval methods include CCA [8], LCFS [23] and JRL [24], and DNN-based methods include Corr-AE [13], CMDN [11], MHTN [25], Deep-SM [12], ACMR [5], MASLN [26] and CMST [38].…”
Section: Comparison With Existing Methodsmentioning
confidence: 99%
“…If the modality classifier cannot distinguish the embeddings modality in the common space, the feature projectors generate modality-invariant embeddings, so it can reduce the modality gap. The cross-modal similarity transferring (CMST) [38] tries to learn the single-modal similarities and transfer them to the commom subspace with adversarial learning. The modal-adversarial hybrid transfer network (MHTN) [25] aims to transfer knowledge from a single-modal source domain to a cross-modal target domain and learn the cross-modal common representation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Zhang et al [16] proposed to adversarial learning based method to learn attention mask for cross-modal feature generation. Wen et al [17] introduced a new cross-modal similarity transferring (CMST) method by adversarial learning. This method is to learn common representation subspace via quantitative similarities in single-modal representation subspace.…”
Section: Deep Learningmentioning
confidence: 99%
“…To overcome this challenge, for the first time, this paper proposes to combine cross-modal correlation learning and adversarial learning and develop an end-to-end framework to learn bridge the semantic gap and diminish the cross-modal heterogeneity. Different from the existing studies [15][16][17][18], we combine deep CCA based cross-modal correlation learning and adversarial learning to not only learn the semantic correlations to bridge the semantic gap between different modalities, but implement a better cross-modal distribution alignment to diminish the cross-modal heterogeneity.…”
Section: Introductionmentioning
confidence: 99%