Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Understanding variation in allele frequencies across populations is a central goal of population genetics. Classical models for the distribution of allele frequencies, using forward simulation, coalescent theory, or the diffusion approximation, have been applied extensively for demographic inference, medical study design, and evolutionary studies. Here we propose a tractable model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation but avoids many of its limitations and approximations. We show that the approach is typically faster, more numerically stable, and more easily generalizable than the state-of-the-art software implementation of the diffusion approximation. We present a number of applications to human sequence data, including demographic inference with a five-population joint frequency spectrum and a discussion of the robustness of the out-of-Africa model inference to the choice of modern population.
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly these analyses depend on sophisticated simulations. Re-cent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 − 8% genetic ancestry to individuals in world-wide populations.
We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 − 8% genetic ancestry to individuals in world-wide populations. Author SummaryThroughout human history, populations have expanded and contracted, split and merged, and exchanged migrants. Because these events affected genetic diversity, we can learn about human history by comparing predictions from evolutionary models to genetic data. Here, we show how to rapidly compute such predictions for a wide range of diversity measures within and across populations under complex demographic scenarios. While widely used models of human history accurately predict common measures of diversity, we show that they strongly underestimate the co-occurence of low frequency mutations within human populations in Asia, Europe, and Africa. Models allowing for archaic admixture, the relatively recent mixing of human populations with deeply diverged human lineages, resolve this discrepancy. We use such models to infer demographic models that include both recent and ancient features of human history. We recover the well-characterized admixture of Neanderthals in Eurasian populations, as well as admixture from an as-yet unknown diverged human population within Africa, further suggesting that admixture with deeply diverged lineages occurred multiple times in human history. By simultaneously testing model predictions for a broad range of diversity statistics, we can assess the robustness of common evolutionary models, identify missing historical events, and build more informed models of human demography.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.