2020
DOI: 10.1007/978-3-030-47426-3_24
|View full text |Cite
|
Sign up to set email alerts
|

Deep Multimodal Clustering with Cross Reconstruction

Abstract: Recently, there has been surging interests in multimodal clustering. And extracting common features plays a critical role in these methods. However, since the ignorance of the fact that data in different modalities shares similar distributions in feature space, most works did not mining the inter-modal distribution relationships completely, which eventually leads to unacceptable common features. To address this issue, we propose the deep multimodal clustering with cross reconstruction method, which firstly foc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…The reconstruction task is the most common self-supervised pretext task, both for SV-SSL and for MV-SSL. In SV-SSL, the views are reconstructed from their respective view-specific representations, without any influence from the other views [12,13,[18][19][20][21][22][23]. In MV-SSL, it is common to either do (i) cross view reconstruction, where all views are reconstructed from all view-specific representations [24]; or (ii) fused view reconstruction, where all views are reconstructed from the fused representation [5,7,14,24].…”
Section: Previous Methods As Instances Of Deepmvcmentioning
confidence: 99%
See 1 more Smart Citation
“…The reconstruction task is the most common self-supervised pretext task, both for SV-SSL and for MV-SSL. In SV-SSL, the views are reconstructed from their respective view-specific representations, without any influence from the other views [12,13,[18][19][20][21][22][23]. In MV-SSL, it is common to either do (i) cross view reconstruction, where all views are reconstructed from all view-specific representations [24]; or (ii) fused view reconstruction, where all views are reconstructed from the fused representation [5,7,14,24].…”
Section: Previous Methods As Instances Of Deepmvcmentioning
confidence: 99%
“…Lastly, some methods adopt a two-stage approach, where they first use the SSL components to learn representations, and then apply a traditional clustering method, such as kmeans [3,8,20,22,28], a Gaussian mixture model [14], or spectral clustering [12], on the trained representations.…”
Section: Clustering Modulesmentioning
confidence: 99%
“…To this aim, we propose to modify/extend both the divergence and reconstruction loss terms in L VAE while using the notion of cross-reconstruction. Cross-reconstruction has been used in different settings, for example, for multimodal clustering [65], zero-shot learning [66], and learn transformation invariant representations [67]. In the current context, we implement cross-reconstruction as follows: For a given instance (x, y), we select a second instance (u, v) at random from the temporal vicinity of (x, y).…”
Section: Enforcing Temporal Stability In Latent Representationsmentioning
confidence: 99%