Feature selection is a critical step in the data preprocessing phase in the field of pattern recognition and machine learning. The core of feature selection is to analyze and quantify the relevance, irrelevance, and redundancy between features and class labels. While existing feature selection methods give multiple explanations for these relationships, they ignore the multi-value bias of class-independent features and the redundancy of class dependent features. Therefore, a feature selection method (Maximal independent classification information and minimal redundancy, MICIMR) is proposed in this paper. Firstly, the relevance and redundancy terms of class independent characteristics are calculated respectively based on the symmetric uncertainty coefficient. Secondly, it calculates the relevance and redundancy terms of class-dependent features according to the independent classification information criterion. Finally, the selection criteria for these two characteristics are combined. To verify the effectiveness of the MICIMR algorithm, five feature selection methods are compared with the MICIMR algorithm on fifteen real datasets. The experimental results demonstrate that the MICIMR algorithm outperforms the other feature selection algorithms in terms of redundancy rate as well as classification accuracy (Gmean_macro and F1_macro).
Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.