2021
DOI: 10.48550/arxiv.2106.04538
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What Makes Multi-modal Learning Better than Single (Provably)

Abstract: The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform unimodal models, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multimodal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multimodal provably perform better than unimodal? In this paper, we answer this qu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…The fusion method achieves much higher results than audio classifiers or text classifiers. As in the [36] study, multi-modality outperforms single since the former has access to a better latent space From Tables 5 and 6, it can be concluded that MER text experiments consistently achieve higher accuracy than MER audio. This happens because the complexity of Mel-spectrogram data from audio is much higher.…”
Section: Resultsmentioning
confidence: 59%
“…The fusion method achieves much higher results than audio classifiers or text classifiers. As in the [36] study, multi-modality outperforms single since the former has access to a better latent space From Tables 5 and 6, it can be concluded that MER text experiments consistently achieve higher accuracy than MER audio. This happens because the complexity of Mel-spectrogram data from audio is much higher.…”
Section: Resultsmentioning
confidence: 59%
“…Given that mpox is an infectious disease, it is recommended that primary healthcare providers combine epidemiological survey results when using the mpox-AISM app to make more accurate diagnoses and guide patients to refer them to professional institutions. 44 In addition, considering that models based on multimodal inputs have better learning ability compared to single modal input models, 45 we are planning to develop a multimodal model that combines rash images and epidemiological survey results, aiming to improve the diagnostic accuracy and stability of mpox-AISM.…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, the multimodal model closely aligns with the diagnostic behavior of clinicians and offers better interpretability. 36 In this context, we utilized the soft voting method in ensemble learning to construct the multimodal DL model for verifying its diagnostic assistance to clinicians of varying expertise levels during subsequent testing. We selected the voting method because it closely approximates the decision-making process of clinicians, prioritizing more important image types.…”
Section: Methodsmentioning
confidence: 99%