Most existing deep learning-based sentiment classification methods need large human-annotated data, but labeling large amounts of high-quality emotional texts is labor-intensive. Users on various social platforms generate massive amounts of tagged opinionated text (e.g., tweets, customer reviews), providing a new resource for training deep models. However, some of the tagged instances have sentiment tags that are diametrically opposed to their true semantics. We cannot use this tagged data directly because the noisy labeled instances have a negative impact on the training phase. In this paper, we present a novel Simple Weakly-supervised Contrastive Learning framework (SWCL). We use the contrastive learning strategy to pre-train the deep model on the large user-tagged data (referred to as weakly-labeled data) and then the pre-trained model is fine-tuned on the small human-annotated data. We refine the contrastive loss function to better exploit inter-class contrastive patterns, making contrastive learning more applicable to the weakly-supervised setting. Besides, multiple sampling on different sentiment pairs reduces the negative impact of label noises. SWCL captures the diverse sentiment semantics of weakly labeled data and improves their suitability for downstream sentiment classification tasks. Our method outperforms the other baseline methods in experiments on the Amazon review, Twitter, and SST-5 datasets. Even when fine-tuned on 0.5 percent of the training data (i.e. 32 instances), our framework significantly boosts the deep models’ performance, demonstrating its robustness in a few-shot learning scenario.
Multi-view clustering aims to leverage information from multiple views to improve clustering. Most previous works assumed that each view has complete data. However, in real-world datasets, it is often the case that a view may contain some missing data, resulting in the incomplete multi-view clustering problem. Previous methods for this problem have at least one of the following drawbacks: (1) employing shallow models, which cannot well handle the dependence and discrepancy among different views; (2) ignoring the hidden information of the missing data; (3) dedicated to the two-view case. To eliminate all these drawbacks, in this work we present an Adversarial Incomplete Multi-view Clustering (AIMC) method. Unlike most existing methods which only learn a new representation with existing views, AIMC seeks the common latent space of multi-view data and performs missing data inference simultaneously. In particular, the element-wise reconstruction and the generative adversarial network (GAN) are integrated to infer the missing data. They aim to capture overall structure and get a deeper semantic understanding respectively. Moreover, an aligned clustering loss is designed to obtain a better clustering structure. Experiments conducted on three datasets show that AIMC performs well and outperforms baseline methods.
Multi-view data is common in real-world datasets, where different views describe distinct perspectives. To better summarize the consistent and complementary information in multi-view data, researchers have proposed various multi-view representation learning algorithms, typically based on factorization models. However, most previous methods were focused on shallow factorization models which cannot capture the complex hierarchical information. Although a deep multi-view factorization model has been proposed recently, it fails to explicitly discern consistent and complementary information in multi-view data and does not consider conceptual labels. In this work we present a semi-supervised deep multi-view factorization method, named Deep Multi-view Concept Learning (DMCL). DMCL performs nonnegative factorization of the data hierarchically, and tries to capture semantic structures and explicitly model consistent and complementary information in multi-view data at the highest abstraction level. We develop a block coordinate descent algorithm for DMCL. Experiments conducted on image and document datasets show that DMCL performs well and outperforms baseline methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.