Background: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.
Gene expression profiling offers a great opportunity for studying multi-factor diseases and for understanding the key role of genes in mechanisms which drive a normal cell to a cancer state. Single gene analysis is insufficient to describe the complex perturbations responsible for cancer onset, progression and invasion. A deeper understanding of the mechanisms of tumorigenesis can be reached focusing on deregulation of gene sets or pathways rather than on individual genes. We apply two known and statistically well founded methods for finding pathways and biological processes deregulated in pathological conditions by analyzing gene expression profiles. In particular, we measure the amount of deregulation and assess the statistical significance of predefined pathways belonging to a curated collection (Molecular Signature Database) in a colon cancer data set. We find that pathways strongly involved in different tumors are strictly connected with colon cancer. Moreover, our experimental results show that the study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. Our study shows the importance of using gene sets rather than single genes for understanding the main biological processes and pathways involved in colorectal cancer. Our analysis evidences that many of the genes involved in these pathways are strongly associated to colorectal tumorigenesis. In this new perspective, the focus shifts from finding differentially expressed genes to identifying biological processes, cellular functions and pathways perturbed in the phenotypic conditions by analyzing genes co-expressed in a given pathway as a whole, taking into account the possible interactions among them and, more importantly, the correlation of their expression with the phenotypical conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.