2013
DOI: 10.1007/s00521-013-1368-0
|View full text |Cite
|
Sign up to set email alerts
|

A review of feature selection methods based on mutual information

Abstract: Artículo de publicación ISIIn this work, we present a review of the state of the art of information-theoretic feature selection methods. The concepts of feature relevance, redundance, and complementarity (synergy) are clearly defined, as well as Markov blanket. The problem of optimal feature selection is defined. A unifying theoretical framework is described, which can retrofit successful heuristic criteria, indicating the approximations made by each method. A number of open problems in the field are p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
445
0
11

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 913 publications
(456 citation statements)
references
References 49 publications
0
445
0
11
Order By: Relevance
“…The green and red lines define the upper and lower limit of the error, in which all features correlate. Here, we build a mutual information (MI) function [14] so we can quantify the relevance of a feature upon other in the random set and this information is used to build the construct for Irr.F, since once our classifier learns, it will mature the Irr.F learning module as defined in the Algorithm 1.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The green and red lines define the upper and lower limit of the error, in which all features correlate. Here, we build a mutual information (MI) function [14] so we can quantify the relevance of a feature upon other in the random set and this information is used to build the construct for Irr.F, since once our classifier learns, it will mature the Irr.F learning module as defined in the Algorithm 1.…”
Section: Methodsmentioning
confidence: 99%
“…However, their comparative study did not reveal any development where each feature can achieve a run-time predictive scoring and can be added or removed algorithmically as the learning process continues. Vergara and Estevez [14] reviewed feature selection methods. They presented updates on results in a unifying framework to retrofit successful heuristic criteria.…”
Section: Related Studymentioning
confidence: 99%
“…The embedded method is a component embedded in the machine-learning algorithm, and the most typical is the decision tree algorithm [63][64][65]. The filter method eliminates the training steps of the classifier and is therefore suitable for large-scale datasets and as a pre-selector for features [63,65,66]. The wrapper method is based on the performance of the machine-learning algorithm to evaluate the merits of the feature subset [67].…”
Section: Feature Selection Methodsmentioning
confidence: 99%
“…The entropy of variable X can be conditioned on variable Y as H(X|Y ). If variable Y does not introduce any information which influences the uncertainty of X, in other word, X and Y are statistically independent, the conditional entropy is maximised [Vergara and Esteves, 2014]. From this description, mutual information IG(X;Y ) can be derived as H(X) − H(X|Y ).…”
Section: Frame Workmentioning
confidence: 99%
“…Mutual information represents the amount of information mutually shared between variable X and Y . This definition is useful within the context of feature selection because it gives a way to quantify the relevance of a feature with respect to the class [Vergara and Esteves, 2014]. Therefore, using mutual information in the wrapper approach benefits both the optimal feature space search and the selection performance enhancement.…”
Section: Frame Workmentioning
confidence: 99%