Feature selection aims to gain relevant features for improved classification performance and remove redundant features for reduced computational cost. How to balance these two factors is a problem especially when the categorical labels are costly to obtain. In this paper, we address this problem using semisupervised learning method and propose a max-relevance and min-redundancy criterion based on Pearson's correlation (RRPC) coefficient. This new method uses the incremental search technique to select optimal feature subsets. The new selected features have strong relevance to the labels in supervised manner, and avoid redundancy to the selected feature subsets under unsupervised constraints. Comparative studies are performed on binary data and multicategory data from benchmark data sets. The results show that the RRPC can achieve a good balance between relevance and redundancy in semisupervised feature selection. We also compare the RRPC with classic supervised feature selection criteria (such as mRMR and Fisher score), unsupervised feature selection criteria (such as Laplacian score), and semisupervised feature selection criteria (such as sSelect and locality sensitive). Experimental results demonstrate the effectiveness of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.