We report a complex set of scaling relationships between mutation and reproduction in a simple model of a population. These follow from a consideration of patterns of genetic diversity in a sample of DNA sequences. Five different possible limit processes, each with a different scaled mutation parameter, can be used to describe genetic diversity in a large population. Only one of these corresponds to the usual population genetic model, and the others make drastically different predictions about genetic diversity. The complexity arises because individuals can potentially have very many offspring. To the extent that this occurs in a given species, our results imply that inferences from genetic data made under the usual assumptions are likely to be wrong. Our results also uncover a fundamental difference between populations in which generations are overlapping and those in which generations are discrete. We choose one of the five limit processes that appears to be appropriate for some marine organisms and use a sample of genetic data from a population of Pacific oysters to infer the parameters of the model. The data suggest the presence of rare reproduction events in which $8% of the population is replaced by the offspring of a single individual.
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar "heavily skewed" reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between different population models. D IPLOIDY, in which each offspring receives two sets of chromosomes, one from each of two distinct diploid parents, is fairly common among natural populations. Mathematical models in population genetics tend to assume, however, that all individuals in a population are haploid, simplifying the mathematics. Mendel's laws describe the mechanism of inheritance as composed of two main steps, equal segregation (first law) and independent assortment (second law). The first law proclaims gametes are haploid, i.e., carry only one of each pair of homologous chromosomes. Most models in population genetics are thus models of chromosomes or gene copies. Mendel's second law proclaims independent assortment of alleles at different genes, or loci, into gametes. Linkage of alleles on chromosomes, resulting in nonrandom association of alleles at different loci into gametes, is of course an important exception to the second law.Coalescent processes (Kingman 1982a,b;Hudson 1983b;Tajima 1983) describe the ancestral relations of chromosome...
Statistical properties of the site-frequency spectrum associated with L-coalescents are our objects of study. In particular, we derive recursions for the expected value, variance, and covariance of the spectrum, extending earlier results of for the classical Kingman coalescent. Estimating coalescent parameters introduced by certain L-coalescents for data sets too large for fulllikelihood methods is our focus. The recursions for the expected values we obtain can be used to find the parameter values that give the best fit to the observed frequency spectrum. The expected values are also used to approximate the probability a (derived) mutation arises on a branch subtending a given number of leaves (DNA sequences), allowing us to apply a pseudolikelihood inference to estimate coalescence parameters associated with certain subclasses of L-coalescents. The properties of the pseudolikelihood approach are investigated on simulated as well as real mtDNA data sets for the high-fecundity Atlantic cod (Gadus morhua). Our results for two subclasses of L-coalescents show that one can distinguish these subclasses from the Kingman coalescent, as well as between the L-subclasses, even for a moderate (maybe a few hundred) sample size. LARGE offspring number population models have recently been proposed as appropriate models with which to investigate high-fecundity natural populations. Some marine populations may belong to the class of high-fecundity populations, including Pacific oysters (Crassostrea gigas) (Beckenbach 1994;Li and Hedgecock 1998;Boudry et al. 2002), white sea bream (Diplodus sargus) (Planes and Lenfant 2002), and Atlantic cod (Gadus morhua) (Árnason 2004). Oysters feature in Williams (1975)'s elm and oyster model as an example of a high-fecundity population. Indeed, high-fecundity populations are discussed at length by Williams (1975) when comparing the benefits of sexual vs. asexual reproduction. Avise et al. (1988) compare genetic distances for mitochondrial (mt)DNA variation for three vertebrate species, american eels (Anguilla rostrata), hardhead catfish (Arius felis), and red-winged blackbirds (Agelaius phoeniceus), and conclude that historical effective population sizes may have been much lower than current census size. Low effective population size compared to census population size observed for certain marine populations in particular (e.g., Hedgecock et al. 1992), and reviewed by Hedgecock and Pudovkin (2011), may be evidence of high variance in offspring distribution. Indeed, Hedrick (2005) observes that low effective population size results from high variance in reproductive success in a population with large census size. High fecundity may also be a way for certain marine organisms with broadcast spawning to compensate for high mortality rate among juveniles, thus exhibiting type III survivorship curves.Multiple-merger coalescent processes, so-called L-and J-coalescents, arise naturally from large offspring number models (Donnelly and Kurtz 1999;Sagitov 1999Sagitov , 2003Möhle and Sagitov 200...
The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a wellknown characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiplemerger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories.KEYWORDS coalescent; multiple mergers; population growth; approximate maximum likelihood test; approximate Bayesian computation; sitefrequency spectrum T HE site-frequency spectrum (SFS) at a given locus is one of the most important and popular statistics based on genetic data sampled from a natural population. In combination with the postulation of the assumptions of the infinitelymany-sites mutation model (Watterson, 1975) and a suitable underlying coalescent framework, the SFS allows one to draw inferences about evolutionary parameters, such as coalescent parameters associated with multiple-merger coalescents or population-growth models.The Kingman coalescent, developed by Kingman (1982a, b,c), Hudson (1983a,b), and Tajima (1983), describing the random ancestral relations among DNA sequences drawn from natural populations, is a prominent and widely used model from which one can make predictions about genetic diversity. Many quantities of interest, such as the expected values and covariances of the SFS associated with the Kingman coalescent, are easily computed thanks to results by Fu (1995). The robustness of the Kingman coalescent is quite remarkable; indeed, a large number of genealogy models can be shown to have the Kingman coalescent or a variant thereof as their limit process (cf., e.g., Möhle 1998). A large volume of work is thus devoted to inference methods based on the Kingman coalescent [see, e.g., Donnelly and Tavaré (1995), Hudson (1990), Nordborg (2001), Hein et al. (2005), and Wakeley (2007) for reviews].However, many evolutionary histories can lead to significant deviations from the Kingman coalescent model. Such deviations can be detected using a variety of statistical tools, such as Tajima's D (Tajima 1989a), Fu and Li's D (Fu and Li 1993), and Fay and Wu 's H (Fay and Wu 2000), which are all functions of the SFS. However, they do not always allow one to identify the actual evolutionary mechanisms leading to such deviations. Developing statistical tools that allow one to dist...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.