2007
DOI: 10.1109/tnn.2007.891630
|View full text |Cite
|
Sign up to set email alerts
|

Maximization of Mutual Information for Supervised Linear Feature Extraction

Abstract: In this paper, we present a novel scheme for linear feature extraction in classification. The method is based on the maximization of the mutual information (MI) between the features extracted and the classes. The sum of the MI corresponding to each of the features is taken as an heuristic that approximates the MI of the whole output vector. Then, a component-by-component gradient-ascent method is proposed for the maximization of the MI, similar to the gradient-based entropy optimization used in independent com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
49
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(50 citation statements)
references
References 9 publications
1
49
0
Order By: Relevance
“…Feature extraction usually acts as a preprocessing stage for a certain purpose such as classification or approximation to the raw data, and is defined as the problem of extracting the most relevant information for the purpose from the raw data [17], [18], [19], [20], [29]. The classical vector-pattern-oriented feature extractions (VecFE) such as Principle Component Analysis (PCA) [22] and Linear Discriminant Analysis (LDA) [23] have been widely used in a variety of areas, but sometimes seem not effective due to their expensive computational cost and singularity problem.…”
Section: Related Workmentioning
confidence: 99%
“…Feature extraction usually acts as a preprocessing stage for a certain purpose such as classification or approximation to the raw data, and is defined as the problem of extracting the most relevant information for the purpose from the raw data [17], [18], [19], [20], [29]. The classical vector-pattern-oriented feature extractions (VecFE) such as Principle Component Analysis (PCA) [22] and Linear Discriminant Analysis (LDA) [23] have been widely used in a variety of areas, but sometimes seem not effective due to their expensive computational cost and singularity problem.…”
Section: Related Workmentioning
confidence: 99%
“…In general, it is desirable to keep the dimensionality of the input features as small as possible to reduce the computational cost of training a classifier as well as its complexity (Torkkola, 2003;Murillo & Rodriguez, 2007). Moreover, using large number of features, when the number of data is low, can cause degradation of the classification performance (Chow & Huang, 2005).…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, using large number of features, when the number of data is low, can cause degradation of the classification performance (Chow & Huang, 2005). Reduction of the number of input features can be done by selecting useful features and discarding others (i.e., feature selection) (Battiti, 1994;Kwak & Choi, 2002;Peng et al, 2005;Estèvez et al, 2009;Sindhwani et al, 2004) or extracting new features containing maximal information about the class label from the original ones (i.e., feature extraction) (Torkkola, 2003;Hild II et al, 2006;Kwak, 2007;Murillo & Rodriguez, 2007). In this paper, we focus on feature selection methods.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations