2014
DOI: 10.1101/013227
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scaling probabilistic models of genetic variation to millions of humans

Abstract: A major goal of population genetics is to quantitatively understand variation of genetic polymorphisms among individuals. The aggregated number of genotyped humans is currently on the order millions of individuals, and existing methods do not scale to data of this size. To solve this problem we developed TeraStructure, an algorithm to fit Bayesian models of genetic variation in structured human populations on tera-sample-sized data sets (10 12 observed genotypes, e.g., 1M individuals at 1M SNPs). TeraStructure… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(31 citation statements)
references
References 29 publications
0
31
0
Order By: Relevance
“…The Human Genome Diversity Panel (HGDP) includes genomic samples from 52 human populations from across the world Cavalli-Sforza 2005). This set of populations is particularly well-suited as an example, because it has served as a test set for a variety of population structure methods (Corander et al 2004;Corander and Marttinen 2006;Francois et al 2006;Patterson et al 2006;Nievergelt et al 2007;Corander et al 2008;Hubisz et al 2009;Shringarpure and Xing 2009;Jombart et al 2010;Pickrell and Pritchard 2012;San Lucas et al 2012;Loh et al 2013;Frichot et al 2014;Raj et al 2014;Gopalan et al 2016;Granot et al 2016;Hao et al 2016;Hunley et al 2016;Zheng and Weir 2016). We generated a PST from 938 HGDP individuals, typed at 647,976 SNPs.…”
Section: Population Structure Tree In Humansmentioning
confidence: 99%
“…The Human Genome Diversity Panel (HGDP) includes genomic samples from 52 human populations from across the world Cavalli-Sforza 2005). This set of populations is particularly well-suited as an example, because it has served as a test set for a variety of population structure methods (Corander et al 2004;Corander and Marttinen 2006;Francois et al 2006;Patterson et al 2006;Nievergelt et al 2007;Corander et al 2008;Hubisz et al 2009;Shringarpure and Xing 2009;Jombart et al 2010;Pickrell and Pritchard 2012;San Lucas et al 2012;Loh et al 2013;Frichot et al 2014;Raj et al 2014;Gopalan et al 2016;Granot et al 2016;Hao et al 2016;Hunley et al 2016;Zheng and Weir 2016). We generated a PST from 938 HGDP individuals, typed at 647,976 SNPs.…”
Section: Population Structure Tree In Humansmentioning
confidence: 99%
“…Bayesian methods (GOPALAN et al, 2016;PRITCHARD et al, 2000;RAJ et al, 2014) fit the PSD model specifically, while existing maximum likelihood methods (ALEXANDER et al, 2009;TANG et al, 2005) and ALStructure require only the admixture model assumptions.…”
Section: The Admixture Modelmentioning
confidence: 99%
“…(i) the allele frequencies of ancestral populations (ii) the admixture proportions of each modern individual Many popular global ancestry estimation methods have been developed within a probabilistic framework. In these methods, which we will refer to as likelihood-based approaches, the strategy is to fit a probabilistic model to the observed genome-wide genotype data by either maximizing the likelihood function (ALEXANDER et al, 2009;TANG et al, 2005) or the posterior probability (GOPALAN et al, 2016;PRITCHARD et al, 2000;RAJ et al, 2014). The probabilistic model fit in each of these cases is the admixture model, described in detail in Section 2.1, in which the global ancestry quantities (i) and (ii) are explicit parameters to be estimated.…”
Section: Introductionmentioning
confidence: 99%
“…When genome-wide SNP data first became available, difficulties in applying the now classic Bayesian method for admixture inference (STRUCTURE [45]) occasioned the development of fast maximum-likelihood based approaches [47,48]. The past two years have seen the return of Bayesian approaches with two new fast approaches that use variational approximations [49,50]. Impressively, teraSTRUCTURE [50] handles samples containing millions of individuals.…”
Section: Refining and Expanding Models That Handle Human Admixturementioning
confidence: 99%
“…The past two years have seen the return of Bayesian approaches with two new fast approaches that use variational approximations [49,50]. Impressively, teraSTRUCTURE [50] handles samples containing millions of individuals. Local ancestry approaches have recently been improved by the development of fast algorithms that leverage rare variants within ancestries [51] and wavelet techniques [52].…”
Section: Refining and Expanding Models That Handle Human Admixturementioning
confidence: 99%