2022
DOI: 10.1016/j.neunet.2021.11.019
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging hierarchy in multimodal generative models for effective cross-modality inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…• An unsupervised learning problem, where we learn multimodal representations on the Multimodal Handwritten Digits (MHD) dataset (Vasco et al, 2022). We showcase the geometric alignment of representations and demonstrate the superior performance of GMC compared to the baselines on a downstream classification task with missing modalities (Section 5.1);…”
Section: Methodsmentioning
confidence: 98%
See 1 more Smart Citation
“…• An unsupervised learning problem, where we learn multimodal representations on the Multimodal Handwritten Digits (MHD) dataset (Vasco et al, 2022). We showcase the geometric alignment of representations and demonstrate the superior performance of GMC compared to the baselines on a downstream classification task with missing modalities (Section 5.1);…”
Section: Methodsmentioning
confidence: 98%
“…Recently, hierarchical multimodal VAEs have been proposed to facilitate the learning of aligned multimodal representations such as Nexus (Vasco et al, 2022) and Multimodal Sensing (MUSE) (Vasco et al, 2021). Nexus considers a two-level hierarchy of modality-specific and multimodal representation spaces employing a dropout-based training scheme.…”
Section: Related Workmentioning
confidence: 99%
“…We compared our MLD method to MVAE [21], MMVAE [22], MOPOE [23], Hierarchical Genertive Model (NEXUS) [26], Multi-view Total Correlation Autoencoder (MVTCAE) [27], and MMVAE+ [29], re-implementing all competitors in the same code base as our method and selecting their best hyperparameters as indicated by the authors (see Appendix D for more details). For a fair comparison, we used the same encoder/decoder architecture for all models.…”
Section: Methodsmentioning
confidence: 99%
“…The Multimodal Handwritten Digits dataset (MHD) [26] contains gray-scale images of digits, the motion trajectory of the handwriting, and the sounds of the spoken digits.…”
Section: Mhdmentioning
confidence: 99%
See 1 more Smart Citation