Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps (SOM) based clustering and Genetic Algorithm (GA) based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index (XB), Davies-Bouldin index (DB), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Dunn's Index (DI) along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.
Feature reduction reduces the dimensionality of a database and selects more informative features by removing the irrelevant features. Selecting features in unsupervised learning scenarios is a harder problem than supervised feature selection due to the absence of class labels that would guide the search for relevant features. PSO is an evolutionary computation technique which finds global optimum solution in many applications. Rough set is a powerful tool for data reduction based on dependency between attributes. This work combines the benefits of both PSO and rough sets. This paper describes a novel Unsupervised PSO based Quick Reduct (US-PSO-QR) for feature selection which employs a population of particles existing within a multi-dimensional space. The performance of the proposed algorithm is compared with the existing unsupervised feature selection methods and the efficiency is measured by using K-Means Clustering and Rough K-Means Clustering.
Gene clustering is a familiar step in the exploratory analysis of high dimensional biological data. It is the process of grouping genes of similar patterns in the same cluster and aims at analyzing the functions of gene that leads to the development of drugs and early diagnosis of diseases. In the recent years, much research has been proposed using nature inspired meta-heuristic algorithms. Cuckoo Search is one such optimization algorithm inspired from nature by breeding strategy of parasitic bird, the cuckoo. This paper proposes cuckoo search clustering and clustering using levy flight cuckoo search for grouping brain tumor gene expression dataset. A comparative study is made with genetic algorithm, PSO clustering, cuckoo search clustering and clustering using levy flight cuckoo search. Levy flight is an important property of levy distribution which covers the entire search space. Breeding pattern of cuckoo is associated with the genes that cause tumor to grow and affect other organs gradually. Clusters generated by these algorithms are validated to find the closeness among the genes in a cluster and separation of genes between clusters. Experimental results carried out in this paper show that cuckoo search clustering outperforms other clustering methods used for experimentation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.