2022
DOI: 10.48550/arxiv.2202.03390
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Geometric Multimodal Contrastive Representation Learning

Abstract: Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a twolevel architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate represent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…By adopting crossmodal attention modules, MulT successfully fuses the information of multiple modalities and achieves superior performance than standard early fusion and late fusion approaches. Poklukar et al [46] further enhance the learned representation of MulT with their Geometric Multimodal Contrastive (GMC) Loss (e.g., L CE + L GMC ) and accomplish new state-of-the-art performance when there are any missing modalities in observed instances. For DMM, all of the experimental environments are set the same with [46], and we just add our Dynamic Mixed Margin Loss on GMC with hyperparameter ξ (e.g., L CE + L GMC + ξ L DMM ) that control the relative magnitude between DMM and other loss terms.…”
Section: ) Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…By adopting crossmodal attention modules, MulT successfully fuses the information of multiple modalities and achieves superior performance than standard early fusion and late fusion approaches. Poklukar et al [46] further enhance the learned representation of MulT with their Geometric Multimodal Contrastive (GMC) Loss (e.g., L CE + L GMC ) and accomplish new state-of-the-art performance when there are any missing modalities in observed instances. For DMM, all of the experimental environments are set the same with [46], and we just add our Dynamic Mixed Margin Loss on GMC with hyperparameter ξ (e.g., L CE + L GMC + ξ L DMM ) that control the relative magnitude between DMM and other loss terms.…”
Section: ) Methodsmentioning
confidence: 99%
“…In this paper, we propose a modified contrastive learning method, dynamic mixed margin (DMM), that is compatible with diverse previous works. To validate our method, we apply our methods on COOT [1] and other variants [46] for video-text retrieval and video classification task.…”
Section: Video-text Retrievalmentioning
confidence: 99%
See 2 more Smart Citations