Modern humans have populated Europe for more than 45,000 years1,2. Our knowledge of the genetic relatedness and structure of ancient hunter-gatherers is however limited, owing to the scarceness and poor molecular preservation of human remains from that period3. Here we analyse 356 ancient hunter-gatherer genomes, including new genomic data for 116 individuals from 14 countries in western and central Eurasia, spanning between 35,000 and 5,000 years ago. We identify a genetic ancestry profile in individuals associated with Upper Palaeolithic Gravettian assemblages from western Europe that is distinct from contemporaneous groups related to this archaeological culture in central and southern Europe4, but resembles that of preceding individuals associated with the Aurignacian culture. This ancestry profile survived during the Last Glacial Maximum (25,000 to 19,000 years ago) in human populations from southwestern Europe associated with the Solutrean culture, and with the following Magdalenian culture that re-expanded northeastward after the Last Glacial Maximum. Conversely, we reveal a genetic turnover in southern Europe suggesting a local replacement of human groups around the time of the Last Glacial Maximum, accompanied by a north-to-south dispersal of populations associated with the Epigravettian culture. From at least 14,000 years ago, an ancestry related to this culture spread from the south across the rest of Europe, largely replacing the Magdalenian-associated gene pool. After a period of limited admixture that spanned the beginning of the Mesolithic, we find genetic interactions between western and eastern European hunter-gatherers, who were also characterized by marked differences in phenotypically relevant variants.
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set (‘1240k’). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Motivation Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1x average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1x and below) remains a challenging task for existing approaches. Results We present a new method to estimate contamination in aDNA for male modern humans. It utilizes a Li&Stephens haplotype copying model for haploid X chromosomes, with mismatches modelled as errors or contamination. We assessed this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our experiments demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1x for SNP capture data (1240k) and 0.02x for whole genome sequencing data (WGS), substantially extending the coverage limit of previous male X chromosome based contamination estimation methods. Our experiments demonstrate that hapCon has little bias for contamination up to 25-30% as long as the contaminating source is specified within continental genetic variation, and that its application range extends to human aDNA as old as ∼45,000 and various global ancestries. Availability We make hapCon available as part of a python package (hapROH), which is available at the Python Package Index (https://pypi.org/project/hapROH) and can be installed via pip. The documentation provides example use cases as blueprints for custom applications (https://haproh.readthedocs.io/en/latest/hapCon.html). The program can analyze either BAM files or pileup files produced with samtools. An implementation of our software (hapCon) using Python and C is deposited at https://github.com/hyl317/hapROH. Supplementary information Supplementary data are available at Bioinformatics online.
Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1x average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1x and below) remains a challenging task for existing approaches. We present a new method to estimate contamination in aDNA for male individuals. It utilizes a Li&Stephen's haplotype copying model for haploid X chromosomes, with mismatches modelled as genotyping error or contamination. We assessed an implementation of this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our results demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1x for SNP capture data (1240k) and 0.02x for whole genome sequencing data (WGS), substantially extending the coverage limit of previous male X chromosome based contamination estimation methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.