A Method to Clustering the Feature Ranking on Data Classification Using an Ensemble Feature Selection

Kaoungku, Nuntawut; Kerdprasop, Kittisak; Kerdprasop, Nittaya

doi:10.18178/ijfcc.2017.6.3.494

Cited by 5 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This research, thus, aims at extending the previous work of Nuntawut et al [7], [8] by proposing a silhouette width criterion for automatic setting of initial cluster numbers. We also add confidence criteria into feature selection based on association rule mining technique to increase performance.…”

Section: Introductionmentioning

confidence: 97%

“…But this feature selection algorithm does not work automatically because human is the one who select the features one by one based on the feature scores reported from the algorithm. Therefore, Nuntawut et al [8] improved the algorithm by proposing clustering technique to cluster the feature scores to assist users on finding an appropriate groups of features. The clustering process is supposed to be automatic in the sense that the number of clusters should be judged by the process itself.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Silhouette Width Criterion for Clustering and Association Mining to Select Image Features

Kaoungku¹,

Suksut²,

Chanklan³

et al. 2018

IJMLC

Self Cite

View full text Add to dashboard Cite

Image data are normally unstructured and high dimensional due to the photography technology advancement such that an image can be taken at a wide range of resolution levels. To overcome such problem, data miners may consider selecting only a minimal set of features that are really important for classifying their images. Feature selection is a popular method for reducing dimensions in data. However, most feature selection algorithms return results in form of score for each feature. It is still difficult for data miners to choose features based on such scoring scheme because they may not know which score range is the best for their data classification at hand. Therefore, in this research, we aim to assist data miners and novice data analysts on solving dimensionality problem by finding for them the best optimal set of features, instead of just reporting the scores of all features and leaving the selection step to be the burden of miners. We select optimal set of features by firstly apply clustering technique to group similar features based on their scores. We thus propose the silhouette width criterion for selecting the optimal number of clusters during the cluster analysis step. After that we perform association mining to analyze relationships that may exist among different subsets of features toward the target attribute. Our method finally reports user the best subset of features to be potentially used further for data classification. We demonstrate performance of our proposed method on the satellite forest image data in Japan.

show abstract

Section: Introductionmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%