Incomplete Multi‐View Clustering aims to enhance clustering performance by using data from multiple modalities. Despite the fact that several approaches for studying this issue have been proposed, the following drawbacks still persist: (1) It is difficult to learn latent representations that account for complementarity yet consistency without using label information; (2) and thus fails to take full advantage of the hidden information in incomplete data results in suboptimal clustering performance when complete data is scarce. In this study, Contrastive Incomplete Multi‐View Image Clustering with Generative Adversarial Networks (CIMIC‐GAN), which uses Generative Adversarial Network (GAN) to fill in incomplete data and uses double contrastive learning to learn consistency on complete and incomplete data is proposed. More specifically, considering diversity and complementary information among multiple modalities, we incorporate autoencoding representation of complete and incomplete data into double contrastive learning to achieve learning consistency. Integrating GANs into the autoencoding process can not only take full advantage of new features of incomplete data, but also better generalise the model in the presence of high data missing rates. Experiments conducted on four extensively used data sets show that CIMIC‐GAN outperforms state‐of‐the‐art incomplete multi‐View clustering methods.
Deep multi-model clustering is a challenging task for data analysis since it learns a universal semantic representation to find correct clusters from heterogeneous samples. However, most existing methods 1) lack an effective approach to getting a global representation of visual instances, which results in a huge semantic gap between visual and textual space. 2) hardly consider partial multi-modal, where each instance is represented by only one modality. In reality, the pairing information for modalities is not available for all instances. To tackle the above issues, we propose a novel model called the Two-Stage Partial Image-Text Clustering (TPIT-C) model. Firstly, we build an interpretable reasoning network to obtain the salient regions and semantic concepts of the scene in order to generate global semantic concepts. Secondly, we construct an adversarial learning module to align textual and visual instances into a unified space by virtue of cycle-consistency. The experimental evaluations on public unpaired multi-model datasets illustrated that the proposed method has better performance and the effectiveness of our algorithm in the partial image-text clustering task.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.