Traditional unbiased genome-wide association studies (GWAS) have successfully identified thousands of loci associated with various complex diseases but there is evidence to suggest that many variants were missed at stringent genome-wide thresholds. Fortunately, there is a rapidly increasing amount of prior knowledge in publicly available genomic datasets and biological databases that can be harnessed to enhance the power of discovering SNPs/Genes from existing or new GWAS datasets. For most diseases, many of the identified loci tend to cluster into a few specific biological pathways/networks. From the point of view of disease etiology, such clustering is generally to be expected. This phenomenon can be exploited to conduct a more powerful genome-wide scan that is tailored to identify loci that are interconnected in pathways. We propose a scalable regression-based analytical framework to enable such a pathway-guided GWAS and demonstrate that it provides significant gains in power to detect disease associated SNPs. Our method requires two inputs, namely a) genome-wide summary level data (e.g., SNP p-values) and b) a grouping of genes into biologically meaningful categories (e.g., a database of pathways). It automatically adjusts the input p-values by incorporating the knowledge derived adaptively from the data and the pathways specified. The method involves a regularized logistic regression analysis to derive priors of each SNP and then re-weights the p-values of SNPs so as to maximize overall power of making discoveries. It increases the power to discover SNPs co-clustering into some of these pathways, while maintaining the global type-1 error (FWER) at the desired level. We used whole-genome simulations and summary data from real GWA studies of psoriasis, SLE, coronary artery disease and type-2 diabetes to illustrate the power improvement achieved by pathway-guided search. Our pipeline implemented as an R package can flexibly handle large number of prior annotations possibly derived from multiple databases.peer-reviewed)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.