2021
DOI: 10.48550/arxiv.2107.07502
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Abstract: Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 97 publications
(206 reference statements)
0
6
0
Order By: Relevance
“…We select seven multimodal or multitask datasets as the source to create six AFL simulations. The seven source datasets are summarized in Table 3 in Appendix, including two image classification datasets (Finn, Abbeel, and Levine 2017), a bimodal driving dataset (Duarte and Hu 2004), a bimodal 3D object recognition dataset (Wu et al 2015;Feng et al 2019), a three-modal two-task multimedia emotion recognition dataset and a bimodal audio-image classification dataset (Liang et al 2021). We then create six simulations from these datasets.…”
Section: Experiments Afl Simulation Setupmentioning
confidence: 99%
“…We select seven multimodal or multitask datasets as the source to create six AFL simulations. The seven source datasets are summarized in Table 3 in Appendix, including two image classification datasets (Finn, Abbeel, and Levine 2017), a bimodal driving dataset (Duarte and Hu 2004), a bimodal 3D object recognition dataset (Wu et al 2015;Feng et al 2019), a three-modal two-task multimedia emotion recognition dataset and a bimodal audio-image classification dataset (Liang et al 2021). We then create six simulations from these datasets.…”
Section: Experiments Afl Simulation Setupmentioning
confidence: 99%
“…As the human brain can by design integrate sensory data to interpret the world, multimodal learning also aims to fuse information from different modalities for an improved decision-making process. These methods have demonstrated remarkable potential in many fields [5], including but not limited to computer vision, natural language processing (NLP), healthcare, and surveillance systems.…”
Section: Introductionmentioning
confidence: 99%
“…Multimodal representation learning is a branch of multimodal machine learning and has gained great attention in vision and language research [28][29][30][31][32]. Many multimodal representation learning methods have been proposed to learn intramodality and intermodality interactions [31,33,34].…”
Section: Introductionmentioning
confidence: 99%