BackgroundClassification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.ResultsThe current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space.ConclusionsReported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.
In the field of pattern recognition, the study of the gene expression profiles of different tissue samples over different experimental conditions has become feasible with the arrival of microarray-based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of microarray technology. In this paper, we have presented a multiobjective optimization (MOO)-based clustering technique utilizing archived multiobjective simulated annealing(AMOSA) as the underlying optimization strategy for classification of tissue samples from cancer datasets. The presented clustering technique is evaluated for three open source benchmark cancer datasets [Brain tumor dataset, Adult Malignancy, and Small Round Blood Cell Tumors (SRBCT)]. In order to evaluate the quality or goodness of produced clusters, two cluster quality measures viz, adjusted rand index and classification accuracy ( % CoA) are calculated. Comparative results of the presented clustering algorithm with ten state-of-the-art existing clustering techniques are shown for three benchmark datasets. Also, we have conducted a statistical significance test called t-test to prove the superiority of our presented MOO-based clustering technique over other clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained. In the field of cancer subtype prediction, this study can have important impact.
In the field of pattern recognition, the study of the gene expression profiles for different tissue samples over different experimental conditions has became feasible with the arrival of micro-array based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of micro-array technology. In this article we have presented a multi-objective optimization ( MOO ) based clustering technique utilizing AMOSA ( Archived Multi-Objective Simulated Annealing ) as the underlying optimization strategy for classification of tissue samples from cancer data sets. As objective functions three cluster validity indices namely, XB, PBM, and FCM indices are optimized simultaneously to form more accurate clusters of tissue samples. The presented clustering technique is evaluated for two open source benchmark cancer data sets, which are Brain tumor data set and Adult Malignancy data set. In order to evaluate the quality or goodness of produced clusters two cluster quality measures viz, Adjusted Rand Index ( ARI ) and Classification Accuracy ( %CoA ) are calculated for each data set. Comparative results of the presented clustering algorithm with 10 state-of-the-art existing single-objective, multi-objective based clustering algorithms are shown for two benchmark data sets. 978-1-4799-3080-7/14/$31.00 c 2014 IEEE
To describe the cellular functions of proteins and genes, a potential dynamic vocabulary is Gene Ontology (GO), which comprises of three sub-ontologies namely, Biological-process, Cellular-component and Molecular-function. It has several applications in the field of bioinformatics like annotating/measuring gene-gene or protein-protein semantic similarity, identifying genes/proteins by their GO annotations for disease gene and target discovery etc. To determine semantic similarity between genes, several semantic measures have been proposed in literature, which involve information content of GO-terms, GO tree structure or the combination of both. But most of the existing semantic similarity measures do not consider different topological and information theoretic aspects of GO-terms collectively. Inspired by this fact, in this article, we have first proposed three novel semantic similarity/distance measures for genes covering different aspects of GO-tree. These are further implanted in the frameworks of well-known multi-objective and single-objective based clustering algorithms to determine functionally similar genes. For comparative analysis, ten popular existing GO based semantic similarity/distance measures and tools are also considered. Experimental results on Mouse genome, Yeast and Human genome datasets evidently demonstrate the supremacy of multi-objective clustering algorithms in association with proposed multi-factored similarity/distance measures. Clustering outcomes are further validated by conducting some biological/statistical significance tests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.