In this study, an observation points‐based positive‐unlabeled learning algorithm (hence called OP‐PUL) is proposed to deal with positive‐unlabeled learning (PUL) tasks by judiciously assigning highly credible labels to unlabeled samples. The proposed OP‐PUL algorithm has three components. First, an observation point classifier ensemble (OPCE) algorithm is constructed to divide unlabeled samples into two categories, which are temporary positive and permanent negative samples. Second, a temporary OPC (TOPC) is trained based on the combination of original positive samples and permanent negative samples and then the permanent positive samples that are correctly classified with TOPC are retained from the temporary positive samples. Third, a permanent OPC (POPC) is finally trained based on the combination of original positive samples, permanent positive samples and permanent negative samples. An exhaustive experimental evaluation is conducted to validate the feasibility, rationality and effectiveness of the OP‐PUL algorithm, using 30 benchmark PU data sets. Results show that (1) the OP‐PUL algorithm is stable and robust as unlabeled samples and positive samples are increased in unlabeled data sets and (2) the permanent positive samples have a consistent probability distribution with the original positive samples. Moreover, a statistical analysis reveals that POPC in the OP‐PUL algorithm can yield better PUL performances on the 30 data sets in comparison with four well‐known PUL algorithms. This demonstrates that OP‐PUL is a viable algorithm to deal with PUL tasks.
An obvious defect of extreme learning machine (ELM) is that its prediction performance is sensitive to the random initialization of input-layer weights and hidden-layer biases. To make ELM insensitive to random initialization, GPRELM adopts the simple an effective strategy of integrating Gaussian process regression into ELM. However, there is a serious overfitting problem in kernel-based GPRELM ( k GPRELM). In this paper, we investigate the theoretical reasons for the overfitting of k GPRELM and further propose a correlation-based GPRELM ( c GPRELM), which uses a correlation coefficient to measure the similarity between two different hidden-layer output vectors. c GPRELM reduces the likelihood that the covariance matrix becomes an identity matrix when the number of hidden-layer nodes is increased, effectively controlling overfitting. Furthermore, c GPRELM works well for improper initialization intervals where ELM and k GPRELM fail to provide good predictions. The experimental results on real classification and regression data sets demonstrate the feasibility and superiority of c GPRELM, as it not only achieves better generalization performance but also has a lower computational complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.