The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
Aging is a complex process affecting different species and individuals in different ways. Comparing genetic variation across species with their aging phenotypes will help understanding the molecular basis of aging and longevity. Although most studies on aging have so far focused on short-lived model organisms, recent comparisons of genomic, transcriptomic, and metabolomic data across lineages with different lifespans are unveiling molecular signatures associated with longevity. Here, we examine the relationship between genomic variation and maximum lifespan across primate species. We used two different approaches. First, we searched for parallel amino-acid mutations that co-occur with increases in longevity across the primate linage. Twenty-five such amino-acid variants were identified, several of which have been previously reported by studies with different experimental setups and in different model organisms. The genes harboring these mutations are mainly enriched in functional categories such as wound healing, blood coagulation, and cardiovascular disorders. We demonstrate that these pathways are highly enriched for pleiotropic effects, as predicted by the antagonistic pleiotropy theory of aging. A second approach was focused on changes in rates of protein evolution across the primate phylogeny. Using the phylogenetic generalized least squares, we show that some genes exhibit strong correlations between their evolutionary rates and longevity-associated traits. These include genes in the Sphingosine 1-phosphate pathway, PI3K signaling, and the Thrombin/protease-activated receptor pathway, among other cardiovascular processes. Together, these results shed light into human senescence patterns and underscore the power of comparative genomics to identify pathways related to aging and longevity.
The enormous mammal’s lifespan variation is the result of each species' adaptations to their own biological trade-offs and ecological conditions. Comparative genomics have demonstrated that genomic factors underlying both, species lifespans and longevity of individuals, are in part shared across the tree of life. Here, we compared protein-coding regions across the mammalian phylogeny to detect individual amino-acid (AA) changes shared by the most long-lived mammals and genes whose rates of protein evolution correlate with longevity. We discovered a total of 2,737 AA in 2,004 genes that distinguish long- and short-lived mammals, significantly more than expected by chance (p = 0.003). These genes belong to pathways involved in regulating lifespan, such as inflammatory response and hemostasis. Among them, a total 1,157 AA showed a significant association with maximum lifespan in a phylogenetic test. Interestingly, most of the detected AA positions do not vary in extant human populations (81.2%) or have allele frequencies below 1% (99.78%). Consequently, almost none of these putatively important variants could have been detected by Genome-Wide Association Studies (GWAS). Additionally, we identified four more genes whose rate of protein evolution correlated with longevity in mammals. Crucially, SNPs located in the detected genes explain a larger fraction of human lifespan heritability than expected, successfully demonstrating for the first time that comparative genomics can be used to enhance interpretation of human GWAS. Finally, we show that the human longevity-associated proteins are significantly more stable than the orthologous proteins from short-lived mammals, strongly suggesting that general protein stability is linked to increased lifespan.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.