With the advent of big data era and the rapid improvement of raw data scale, feature selection is the basic and critical technologies for data mining. However, in most of the studies on feature selection methods before, mainly directed to treat the single feature or overall feature subset, while the influence of the correlation and redundancy of features in the feature subset on the classification results is ignored. In this paper, combination of feature subset grouping and factor analysis (FA), a hybrid feature selection method based on feature subsets generation through factor analysis (FAFS_HFS) is proposed. Firstly, generate feature subsets based on the maximum load (maximum explanatory power) of each feature through factor analysis. Then, minimum redundancy maximum relevance (mRMR) and sequential forward selection (SFS) is used to remove the redundancy of each feature subset. Finally, fisher score (F-score) and feature subset SFS (FS_SFS) was utilized to evaluate and select feature subset, and obtain the optimal feature subset. Experiments are conducted on 14 datasets, the results show that FAFS_HFS has high classification accuracy and dimension reduction on almost all datasets, especially in high-dimensional datasets, and it has excellent efficiency and competitive classification performance compared with other contrastive methods.INDEX TERMS hybrid feature selection, feature subset, factor analysis (FA), mRMR, F-score, sequential forward selection (SFS)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.