2018
DOI: 10.1109/tcbb.2017.2757930
|View full text |Cite
|
Sign up to set email alerts
|

Species Tree Estimation Using ASTRAL: How Many Genes Are Enough?

Abstract: Species tree reconstruction from genomic data is increasingly performed using methods that account for sources of gene tree discordance such as incomplete lineage sorting. One popular method for reconstructing species trees from unrooted gene tree topologies is ASTRAL. In this paper, we derive theoretical sample complexity results for the number of genes required by ASTRAL to guarantee reconstruction of the correct species tree with high probability. We also validate those theoretical bounds in a simulation st… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

4
51
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 34 publications
(55 citation statements)
references
References 56 publications
4
51
0
Order By: Relevance
“…On the other hand, when ILS is high enough, the only chance of estimating the species tree well is to have a large sample of gene trees; hence, unless the sample of genes is large enough to allow many genes to be retained after filtering, deleting genes will be detrimental. Therefore, the data quantity requirement depends not only on the method but also on the model condition, and most species tree methods probably require more data under high ILS model conditions for highly accurate species trees to be estimated (Shekhar et al 2017).…”
Section: Discussionmentioning
confidence: 99%
“…On the other hand, when ILS is high enough, the only chance of estimating the species tree well is to have a large sample of gene trees; hence, unless the sample of genes is large enough to allow many genes to be retained after filtering, deleting genes will be detrimental. Therefore, the data quantity requirement depends not only on the method but also on the model condition, and most species tree methods probably require more data under high ILS model conditions for highly accurate species trees to be estimated (Shekhar et al 2017).…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, effective gene filtering that improves species tree estimation requires finding a balance between data quantity versus data quality (Molloy and Warnow 2018). For low ILS, a few highly accurate gene trees are sufficient to estimate the true species tree (Shekhar et al 2017). This is evident, for example, in the concatenated analyses, which do not consider gene tree heterogeneity and hence do not account for ILS, and did not show changes in support or topology after filtering despite lower number of loci or taxa.…”
Section: Performance Of Filtering Strategiesmentioning
confidence: 99%
“…Many methods are available for combining gene trees (e.g., Kubatko et al, 2009;Liu et al, 2009;Mossel and Roch, 2010;Liu et al, 2010;Chaudhary et al, 2010;Liu and Yu, 2011;Wu, 2012;Bayzid et al, 2013;Sayyari and Mirarab, 2016a), and many of them are statistically consistent under various models of genome evolution. In particular, many of the summary methods have established statistical guarantees (Liu et al, 2010;Allman et al, 2016;Shekhar et al, 2017) under the multi-species coalescent model (Pamilo and Nei, 1988;Rannala and Yang, 2003), which can generate incomplete lineage sorting (ILS) (Degnan and Rosenberg, 2009). Several summary methods, including ASTRAL (Mirarab et al, 2014b), NJst/ASTRID (Liu and Yu, 2011;Vachaspati and Warnow, 2015), and MP-EST (Liu et al, 2010) are in wide use.…”
mentioning
confidence: 99%