Analysis of genomic data requires an efficient way to calculate likelihoods across very large numbers of loci. We describe a general method for finding the distribution of genealogies: we allow migration between demes, splitting of demes [as in the isolation-with-migration (IM) model], and recombination between linked loci. These processes are described by a set of linear recursions for the generating function of branch lengths. Under the infinite-sites model, the probability of any configuration of mutations can be found by differentiating this generating function. Such calculations are feasible for small numbers of sampled genomes: as an example, we show how the generating function can be derived explicitly for three genes under the two-deme IM model. This derivation is done automatically, using Mathematica. Given data from a large number of unlinked and nonrecombining blocks of sequence, these results can be used to find maximum-likelihood estimates of model parameters by tabulating the probabilities of all relevant mutational configurations and then multiplying across loci. The feasibility of the method is demonstrated by applying it to simulated data and to a data set previously analyzed by Wang and Hey (2010) consisting of 26,141 loci sampled from Drosophila simulans and D. melanogaster. Our results suggest that such likelihood calculations are scalable to genomic data as long as the numbers of sampled individuals and mutations per sequence block are small. T HE coalescent process is highly variable: samples from even a single well-mixed population rapidly coalesce down to a few ancestral lineages, so that their deeper ancestry is determined by just a few random coalescence events (Felsenstein 1992). Thus, small samples taken from a large number of loci give much more information than large samples from a few loci. For example, the distribution of coalescence times, and hence the history of effective population size, has been inferred from single diploid genomes (Li and Durbin 2011). Although it is now feasible to sample very large numbers of markers, or indeed whole genomes, we urgently need methods for analyzing such data. In principle, we can calculate likelihoods from very large data sets, if we have loosely linked blocks of sequence within which recombination is negligible. Provided that only a few genomes are sampled, we can tabulate the probability that any particular configuration of mutations will be seen at each locus and then multiply across large numbers of loci to find the likelihood of our model (Takahata et al. 1995). Wilkinson-Herbots (2008) and Wang and Hey (2010) derive the distribution of coalescence times for a pair of genes sampled from two populations that separated at some time in the past and subsequently exchanged migrants. This "isolation-with-migration" (IM) model is of particular interest in evaluating the role of gene flow during speciation. Hobolth et al. (2011) show how this and similar calculations can be done more efficiently using matrix exponentials.Here, we p...
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
How geographically widespread biological communities assemble remains a major question in ecology. Do parallel population histories allow sustained interactions (such as host-parasite or plant-pollinator) among species, or do discordant histories necessarily interrupt them? Though few empirical data exist, these issues are central to our understanding of multispecies evolutionary dynamics. Here we use hierarchical approximate Bayesian analysis of DNA sequence data for 12 herbivores and 19 parasitoids to reconstruct the assembly of an insect community spanning the Western Palearctic and assess the support for alternative host tracking and ecological sorting hypotheses. We show that assembly occurred primarily by delayed host tracking from a shared eastern origin. Herbivores escaped their enemies for millennia before parasitoid pursuit restored initial associations, with generalist parasitoids no better able to track their hosts than specialists. In contrast, ecological sorting played only a minor role. Substantial turnover in host-parasitoid associations means that coevolution must have been diffuse, probably contributing to the parasitoid generalism seen in this and similar systems. Reintegration of parasitoids after host escape shows these communities to have been unsaturated throughout their history, arguing against major roles for parasitoid niche evolution or competition during community assembly.
Approximate Bayesian computation (ABC) techniques have seen rapid and accelerating development in biology, with applications including population genetics, systems biology, and community ecology (reviewed in Beaumont 2010;Csilléry et al. 2010). However, the approximations and model assumptions inherent in ABC can make model choice and parameter estimation problematic, and careful simulation-based validation and assessment of posterior predictive power are required (Gelman et al.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.