2019
DOI: 10.1038/s41588-019-0483-y
|View full text |Cite|
|
Sign up to set email alerts
|

Inferring whole-genome histories in large population datasets

Abstract: Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology as it encodes information about the events and forces that have influenced a species. However, current methods are limited, with the most accurate able to process no more than a hundred samples. With data sets consisting of millions of genomes being collected, there is a need for scalable and efficient inference methods to fully utilise these resources. We introduce an algorithm to infer whole-genome his… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
396
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 233 publications
(398 citation statements)
references
References 64 publications
1
396
0
1
Order By: Relevance
“…We do not know the true genealogies underlying real data, but recent methods are available to estimate them at scale [Kelleher et al, 2019, Speidel et al, 2019. In Figure 3, we showed that Branch and Site statistics matched well in simulated data.…”
Section: Application To 1000 Genomes Tree Sequencesmentioning
confidence: 90%
See 1 more Smart Citation
“…We do not know the true genealogies underlying real data, but recent methods are available to estimate them at scale [Kelleher et al, 2019, Speidel et al, 2019. In Figure 3, we showed that Branch and Site statistics matched well in simulated data.…”
Section: Application To 1000 Genomes Tree Sequencesmentioning
confidence: 90%
“…The methods were subsequently extended and refined for forwards-time simulations [Kelleher et al, 2018, Haller et al, 2018, with similarly large efficiency gains. Recent work has shown that tree sequence algorithms can also be used to massively increase the scalability of methods for inferring genome-wide genealogies, and making it possible to infer trees for millions of samples [Kelleher et al, 2019]. The key to the remarkable efficiency of tree sequence algorithms is the way that shared structure in adjacent trees along the genome is encoded.…”
Section: Introductionmentioning
confidence: 99%
“…To motivate the haplotype network model, we use the example from Kelleher et al (2019) that presents 5 haplotypes spanning 7 bialelic polymorphic sites ( Table 1). Note that the 5 haplotypes are just a sample of the 2 7 = 128 possible haplotypes over the 7 sites.…”
Section: Motivating Examplementioning
confidence: 99%
“…As mentioned in the introduction, modeling phenotypic variation as a function of haplotype variation has extensive literature (Templeton et al, 1987;Balding, 2006;Thompson, 2013;Morris and Cardon, 2019). The prime motivation for this work is the recent growth in the generation of large scale genomic datasets and methods to build phylogenies (Kelleher et al, 2019). To this end we aimed to develop a general haplotype network model that could exploit phylogenetic relationships between haplotypes in a computationally efficient way.…”
Section: The Importance Of the Haplotype Network Modelmentioning
confidence: 99%
See 1 more Smart Citation