An ensemble of classifiers is created by combining predictions of multiple component classifiers for improving prediction performance. In this paper, we conduct experimental comparison of J48, NB, IBK on nine microarray cancer datasets and also analyze their performance with Bagging, Boosting and Stack Generalization. The experimental results show that all ensemble methods outperform the individual classification methods. We then present a method, referred to as SD-EnClass, for combining classifiers from different classification families into an ensemble, based on a simple estimation of each classifier's class performance. The experimental results show that the proposed model improves classification accuracy, in comparison to simply selecting the best classifier in the combination. In the second stage, we combine the results of our proposed method with the results of Boosting, Bagging and Stacking using the combining method proposed, to obtain results which are significantly better than using Boosting, Bagging or Stacking alone.
When clustering high dimensional data, traditional clustering methods are found to be lacking since they consider all of the dimensions of the dataset in discovering clusters whereas only some of the dimensions are relevant. This may give rise to subspaces within the dataset where clusters may be found. Using feature selection, we can remove irrelevant and redundant dimensions by analyzing the entire dataset. The problem of automatically identifying clusters that exist in multiple and maybe overlapping subspaces of high dimensional data, allowing better clustering of the data points, is known as Subspace Clustering. There are two major approaches to subspace clustering based on search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches start from finding low dimensional dense regions, and then use them to form clusters. Based on a survey on subspace clustering, we identify the challenges and issues involved with clustering gene expression data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.