A review of feature selection methods based on mutual information

Vergara, Jorge R.; Estévez, P. A.

doi:10.1007/s00521-013-1368-0

Cited by 913 publications

(456 citation statements)

References 49 publications

Supporting

Mentioning

445

Contrasting

Unclassified

Order By: Relevance

“…The green and red lines define the upper and lower limit of the error, in which all features correlate. Here, we build a mutual information (MI) function [14] so we can quantify the relevance of a feature upon other in the random set and this information is used to build the construct for Irr.F, since once our classifier learns, it will mature the Irr.F learning module as defined in the Algorithm 1.…”

Section: Methodsmentioning

confidence: 99%

“…However, their comparative study did not reveal any development where each feature can achieve a run-time predictive scoring and can be added or removed algorithmically as the learning process continues. Vergara and Estevez [14] reviewed feature selection methods. They presented updates on results in a unifying framework to retrofit successful heuristic criteria.…”

Section: Related Studymentioning

confidence: 99%

See 1 more Smart Citation

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

et al. 2018

View full text Add to dashboard Cite

Featured Application: This module can be used independently in any Machine Learning project or can be used in a model that is engineered by boosting and blending of algorithms for better accuracy and fitness.Abstract: Machine Learning (ML) requires a certain number of features (i.e., attributes) to train the model. One of the main challenges is to determine the right number and the type of such features out of the given dataset's attributes. It is not uncommon for the ML process to use dataset of available features without computing the predictive value of each. Such an approach makes the process vulnerable to overfit, predictive errors, bias, and poor generalization. Each feature in the dataset has either a unique predictive value, redundant, or irrelevant value. However, the key to better accuracy and fitting for ML is to identify the optimum set (i.e., grouping) of the right feature set with the finest matching of the feature's value. This paper proposes a novel approach to enhance the Feature Engineering and Selection (eFES) Optimization process in ML. eFES is built using a unique scheme to regulate error bounds and parallelize the addition and removal of a feature during training. eFES also invents local gain (LG) and global gain (GG) functions using 3D visualizing techniques to assist the feature grouping function (FGF). FGF scores and optimizes the participating feature, so the ML process can evolve into deciding which features to accept or reject for improved generalization of the model. To support the proposed model, this paper presents mathematical models, illustrations, algorithms, and experimental results. Miscellaneous datasets are used to validate the model building process in Python, C#, and R languages. Results show the promising state of eFES as compared to the traditional feature selection process.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Studymentioning

confidence: 99%

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

et al. 2018

View full text Add to dashboard Cite

show abstract

“…The embedded method is a component embedded in the machine-learning algorithm, and the most typical is the decision tree algorithm [63][64][65]. The filter method eliminates the training steps of the classifier and is therefore suitable for large-scale datasets and as a pre-selector for features [63,65,66]. The wrapper method is based on the performance of the machine-learning algorithm to evaluate the merits of the feature subset [67].…”

Section: Feature Selection Methodsmentioning

confidence: 99%

A Review of Fine-Scale Land Use and Land Cover Classification in Open-Pit Mining Areas by Remote Sensing Techniques

Chen

et al. 2017

Remote Sensing

View full text Add to dashboard Cite

Abstract:Over recent decades, fine-scale land use and land cover classification in open-pit mine areas (LCCMA) has become very important for understanding the influence of mining activities on the regional geo-environment, and for environmental impact assessment procedure. This research reviews advances in fine-scale LCCMA from the following aspects. Firstly, it analyzes and proposes classification thematic resolution for LCCMA. Secondly, remote sensing data sources, features, feature selection methods, and classification algorithms for LCCMA are summarized. Thirdly, three major factors that affect LCCMA are discussed: significant three-dimensional terrain features, strong LCCMA feature variability, and homogeneity of spectral-spatial features. Correspondingly, three key scientific issues that limit the accuracy of LCCMA are presented. Finally, several future research directions are discussed: (1) unitization of new sensors, particularly those with stereo survey ability; (2) procurement of sensitive features by new sensors and combinations of sensitive features using novel feature selection methods; (3) development of robust and self-adjusted classification algorithms, such as ensemble learning and deep learning for LCCMA; and (4) application of fine-scale mining information for regularity and management of mines.

show abstract

“…The entropy of variable X can be conditioned on variable Y as H(X|Y ). If variable Y does not introduce any information which influences the uncertainty of X, in other word, X and Y are statistically independent, the conditional entropy is maximised [Vergara and Esteves, 2014]. From this description, mutual information IG(X;Y ) can be derived as H(X) − H(X|Y ).…”

Section: Frame Workmentioning

confidence: 99%

“…Mutual information represents the amount of information mutually shared between variable X and Y . This definition is useful within the context of feature selection because it gives a way to quantify the relevance of a feature with respect to the class [Vergara and Esteves, 2014]. Therefore, using mutual information in the wrapper approach benefits both the optimal feature space search and the selection performance enhancement.…”

Section: Frame Workmentioning

confidence: 99%

k-fold Subsampling based Sequential Backward Feature Elimination

Park

Kang

Zhou

2016

Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods

View full text Add to dashboard Cite

Abstract:We present a new wrapper feature selection algorithm for human detection. This algorithm is a hybrid feature selection approach combining the benefits of filter and wrapper methods. It allows the selection of an optimal feature vector that well represents the shapes of the subjects in the images. In detail, the proposed feature selection algorithm adopts the k-fold subsampling and sequential backward elimination approach, while the standard linear support vector machine (SVM) is used as the classifier for human detection. We apply the proposed algorithm to the publicly accessible INRIA and ETH pedestrian full image datasets with the PASCAL VOC evaluation criteria. Compared to other state of the arts algorithms, our feature selection based approach can improve the detection speed of the SVM classifier by over 50% with up to 2% better detection accuracy. Our algorithm also outperforms the equivalent systems introduced in the deformable part model approach with around 9% improvement in the detection accuracy.

show abstract

A review of feature selection methods based on mutual information

Cited by 913 publications

References 49 publications

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

A Review of Fine-Scale Land Use and Land Cover Classification in Open-Pit Mining Areas by Remote Sensing Techniques

k-fold Subsampling based Sequential Backward Feature Elimination

Contact Info

Product

Resources

About