Ancient genomic sequences have started revealing the origin and the demographic impact of Neolithic farmers spreading into Europe1–3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers precludes an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. By sequencing a ~7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León (Spain), we retrieved the first complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across Western and Central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer. Hence, these genomic variants cannot represent novel mutations that occurred during the adaptation to the farming lifestyle.
Since the publication of the Wellcome Trust Case Control Consortium (WTCCC) landmark study a decade ago, genome-wide association studies (GWAS) have led to the discovery of thousands of risk variants involved in disease etiology. This success story has two angles that are often overlooked. First, GWAS findings are highly replicable. This is an unprecedented phenomenon in complex trait genetics, and indeed in many areas of science, which in past decades have been plagued by false positives. At a time of increasing concerns about the lack of reproducibility, we examine the biological and methodological reasons that account for the replicability of GWAS and identify the challenges ahead. In contrast to the exemplary success of disease gene discovery, at present GWAS findings are not useful for predicting phenotypes. We close with an overview of the prospects for individualized prediction of disease risk and its foreseeable impact in clinical practice.
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
Recent results from large-scale genomic projects suggest that allele frequencies, which are highly relevant for medical purposes, differ considerably across different populations. The need for a detailed catalog of local variability motivated the whole-exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. Like in other studies, a considerable number of rare variants were found (almost one-third of the described variants). There were also relevant differences in allelic frequencies in polymorphic variants, including ∼10,000 polymorphisms private to the Spanish population. The allelic frequencies of variants conferring susceptibility to complex diseases (including cancer, schizophrenia, Alzheimer disease, type 2 diabetes, and other pathologies) were overall similar to those of other populations. However, the trend is the opposite for variants linked to Mendelian and rare diseases (including several retinal degenerative dystrophies and cardiomyopathies) that show marked frequency differences between populations. Interestingly, a correspondence between differences in allelic frequencies and disease prevalence was found, highlighting the relevance of frequency differences in disease risk. These differences are also observed in variants that disrupt known drug binding sites, suggesting an important role for local variability in population-specific drug resistances or adverse effects. We have made the Spanish population variant server web page that contains population frequency information for the complete list of 170,888 variant positions we found publicly available (http://spv.babelomics.org/), We show that it if fundamental to determine population-specific variant frequencies to distinguish real disease associations from population-specific polymorphisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.