Feature Selection via Correlation Coefficient Clustering

Hsu, Hui-Huang; Hsieh, Cheng-Wei

doi:10.4304/jsw.5.12.1371-1377

Cited by 80 publications

(45 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The method is based on calculating the entropy of each feature according to the distribution of frequency on different classes. Another type of method proposed by Hsu and Hsieh, which presents an feature selection algorithm of via correlation coefficient clustering [10]. The method collects the features into clusters by measuring their correlation coefficients, and then the most class-dependent feature in each cluster is selected.…”

Section: A Feature Selectionmentioning

confidence: 99%

Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient

Gao

Chien

2012

2012 Conference on Technologies and Applications of Artificial Intelligence

View full text Add to dashboard Cite

Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve highprecision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method of discriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.

show abstract

Section: A Feature Selectionmentioning

confidence: 99%

Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient

Gao

Chien

2012

2012 Conference on Technologies and Applications of Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…The feature transformation follows the idea of correlation coefficient clustering proposed by Hsu and Hsieh (2010), in which data points with similar features are grouped in clusters when using their mutual correlation coefficients. Since it can be assumed that berries in one image have similar features, the new features l (·),c of the candidates are derived from the median correlation to the reference patches…”

Section: Feature Extractionmentioning

confidence: 99%

Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields

Roscher

Herzog

Kunkel

et al. 2014

Computers and Electronics in Agriculture

View full text Add to dashboard Cite

The berry size is one of the most important fruit traits in grapevine breeding. Non-invasive, image-based phenotyping promises a fast and precise method for the monitoring of the grapevine berry size. In the present study an automated image analyzing framework was developed in order to estimate the size of grapevine berries from images in a high-throughput manner. The framework includes (i) the detection of circular structures which are potentially berries and (ii) the classification of these into the class 'berry' or 'non-berry' by utilizing a conditional random field. The approach used the concept of a one-class classification, since only the target class 'berry' is of interest and needs to be modeled. Moreover, the classification was carried out by using an automated active learning approach, i.e. no user interaction is required during the classification process and in addition, the process adapts automatically to changing image conditions, e.g. illumination or berry color. The framework was tested on three datasets consisting in total of 139 images. The images were taken in an experimental vineyard at different stages of grapevine growth according to the BBCH scale. The mean berry size of a plant estimated by the framework correlates with the manually measured berry size by 0.88. 1

show abstract

“…Recently many researchers have introduced clustering techniques into the field of feature selection (Liu et al 2010; Sotoca and Pla 2010; Jung et al 2011). In these selection methods, all candidate features F are first grouped into different clusters in terms of prespecified similarity criteria, such as correlation coefficient, MI, and conditional MI (Hsu and Hsieh 2010; Liu et al 2010; Sotoca and Pla 2010; Jung et al 2011). As a result, the features in the same cluster are highly correlated to each other.…”

Section: Feature Selection Using Information Criteriamentioning

confidence: 99%

A New Supervised Feature Selection Method for Pattern Classification

Liu

Zhang

2012

Computational Intelligence

View full text Add to dashboard Cite

With the rapid development of information techniques, the dimensionality of data in many application domains, such as text categorization and bioinformatics, is getting higher and higher. The high‐dimensionality data may bring many adverse situations, such as overfitting, poor performance, and low efficiency, to traditional learning algorithms in pattern classification. Feature selection aims at reducing the dimensionality of data and providing discriminative features for pattern learning algorithms. Due to its effectiveness, feature selection is now gaining increasing attentions from a variety of disciplines and currently many efforts have been attempted in this field. In this paper, we propose a new supervised feature selection method to pick important features by using information criteria. Unlike other selection methods, the main characteristic of our method is that it not only takes both maximal relevance to the class labels and minimal redundancy to the selected features into account, but also works like feature clustering in an agglomerative way. To measure the relevance and redundancy of feature exactly, two different information criteria, i.e., mutual information and coefficient of relevance, have been adopted in our method. The performance evaluations on 12 benchmark data sets show that the proposed method can achieve better performance than other popular feature selection methods in most cases.

show abstract

Feature Selection via Correlation Coefficient Clustering

Cited by 80 publications

References 22 publications

Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient

Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient

Automated image analysis framework for high-throughput determination of grapevine berry sizes using conditional random fields

A New Supervised Feature Selection Method for Pattern Classification

Contact Info

Product

Resources

About