Most species are structured and influenced by processes that either increased or reduced gene flow between populations. However, most population genetic inference methods assume panmixia and reconstruct a history characterized by population size changes. This is potentially problematic as population structure can generate spurious signals of population size change through time. Moreover, when the model assumed for demographic inference is misspecified, genomic data will likely increase the precision of misleading if not meaningless parameters. For instance, if data were generated under an n-island model (characterized by the number of islands and migrants exchanged) inference based on a model of population size change would produce precise estimates of a bottleneck that would be meaningless. In addition, archaeological or climatic events around the bottleneck's timing might provide a reasonable but potentially misleading scenario. In a context of model uncertainty (panmixia versus structure) genomic data may thus not necessarily lead to improved statistical inference. We consider two haploid genomes and develop a theory that explains why any demographic model with structure will necessarily be interpreted as a series of changes in population size by inference methods ignoring structure. We formalize a parameter, the inverse instantaneous coalescence rate, and show that it is equivalent to a population size only in panmictic models, and is mostly misleading for structured models. We argue that this issue affects all population genetics methods ignoring population structure which may thus infer population size changes that never took place. We apply our approach to human genomic data.
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.
Several inferential methods using genomic data have been proposed to quantify and date population size changes in the history of species. At the same time an increasing number of studies have shown that population structure can generate spurious signals of population size change. Recently, Mazet et al. (2016) introduced, for a sample size of two, a time-dependent parameter, which they called the IICR (inverse instantaneous coalescence rate). The IICR is equivalent to a population size in panmictic models, but not necessarily in structured models. It is characterised by a temporal trajectory that suggests population size changes, as a function of the sampling scheme, even when the total population size was constant. Here, we extend the work of Mazet et al. (2016) by (i) showing how the IICR can be computed for any demographic model of interest, under the coalescent, (ii) applying this approach to models of population structure (1D and 2D stepping stone, split models, two- and three-island asymmetric gene flow, continent-island models), (iii) stressing the importance of the sampling strategy in generating different histories, (iv) arguing that IICR plots can be seen as summaries of genomic information that can thus be used for model choice or model exclusion (v) applying this approach to the question of admixture between humans and Neanderthals. Altogether these results are potentially important given that the widely used PSMC (pairwise sequentially Markovian coalescent) method of Li and Durbin (2011) estimates the IICR of the sample, not necessarily the history of the populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.