2015
DOI: 10.1101/gr.178756.114
|View full text |Cite
|
Sign up to set email alerts
|

Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data

Abstract: With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exac… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
133
0
1

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 89 publications
(135 citation statements)
references
References 47 publications
1
133
0
1
Order By: Relevance
“…Small samples have the disadvantages of increased noise and limited temporal resolution of analysis. For example, in demographic inference, larger samples are essential for detecting the signal of recent rapid growth of the human population [17,64,65]. Interestingly, we found that samples much smaller than ExAC may also create an unappreciated bias, as we describe next.…”
Section: Resultsmentioning
confidence: 71%
See 1 more Smart Citation
“…Small samples have the disadvantages of increased noise and limited temporal resolution of analysis. For example, in demographic inference, larger samples are essential for detecting the signal of recent rapid growth of the human population [17,64,65]. Interestingly, we found that samples much smaller than ExAC may also create an unappreciated bias, as we describe next.…”
Section: Resultsmentioning
confidence: 71%
“…For example, most demographic inference algorithms that use the SFS as a summary statistic [e.g. 6,65,77] rely on the infinite-sites model, which is evidently not a valid assumption for large samples. Adjusting demographic inference schemes to include the effects of recurrent mutations on the SFS (for examples of recent efforts towards this goal, see [7882]) has the potential to significantly improve inference accuracy.…”
Section: Discussionmentioning
confidence: 99%
“…However, the inference can be computational intractable as the number of populations and/or the number of parameters for inference become large and the desired accuracy increases, which is a general disadvantage of simulation-based methods. Bhaskar et al (2015) [18] developed a very efficient method and software ( fastNeutrino ) for inferring population size changes for a single population. The efficiency of stems from analytical computation of the SFS for a set of parameter values of a pre-defined model of population size changes, together with a fast optimization technique.…”
Section: Introductionmentioning
confidence: 99%
“…However, instead of making assumptions about the number of epochs and shape of each epoch in the history, such as exponential growth as the pre-defined model underlying other inference methods, it approximates historical population size via a non-parametric approach that fits many epochs, each of constant population size, while adding population size changing points (new epoch) until a good fit is obtained. Most recently, similar to fastNeutrino [18], we (2016) [22] developed a method (implemented in EGGS , without inference) that allows an analytical and efficient computation of the SFS for a set of parameters from more generalized models than previously possible, including population growth that is sub- or super-exponential.…”
Section: Introductionmentioning
confidence: 99%
“…Methods to study genetic variation, or perform inference, in populations with varying size or more complex demographic histories have been developed based on the Wright-Fisher diffusion, describing the evolution of population allele frequencies forward in time (Griffiths, 2003; Živković et al, 2015; Gutenkunst et al, 2009; Excoffier et al, 2013), or the Coalescent process, a model for the genealogical relationship in a sample of individuals (Griffiths and Tavaré, 1994; Griffiths and Marjoram, 1996; Griffiths and Tavaré, 1998; Živković and Wiehe, 2008; Bhaskar et al, 2015; Kamm et al, 2017). A powerful representation of genetic variation data that has been used in this context is the Site-Frequency-Spectrum.…”
Section: Introductionmentioning
confidence: 99%