2021
DOI: 10.1111/1755-0998.13527
|View full text |Cite
|
Sign up to set email alerts
|

Fast and accurate distance‐based phylogenetic placement using divide and conquer

Abstract: Phylogenetic placement of query samples on an existing phylogeny is increasingly used in molecular ecology, including sample identification and microbiome environmental sampling. As the size of available reference trees used in these analyses continues to grow, there is a growing need for methods that place sequences on ultra‐large trees with high accuracy. Distance‐based placement methods have recently emerged as a path to provide such scalability while allowing flexibility to analyse both assembled and unass… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
46
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 33 publications
(46 citation statements)
references
References 64 publications
0
46
0
Order By: Relevance
“…We now compare accuracy of DEPP to distance-based APPLES-II (Balaban, Jiang, et al, 2021) used with the standard Jukes Cantor (JC) model, maximum likelihood method EPA-ng (Barbera et al, 2019), and the quartet-based discordant-aware method INSTRAL (Rabiee and Mirarab, 2020). Note that APPLES-JC and EPA-ng are not designed for discordant placement using a single gene, and INSTRAL is designed only for datasets with many genes (at least two but ideally many more).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We now compare accuracy of DEPP to distance-based APPLES-II (Balaban, Jiang, et al, 2021) used with the standard Jukes Cantor (JC) model, maximum likelihood method EPA-ng (Barbera et al, 2019), and the quartet-based discordant-aware method INSTRAL (Rabiee and Mirarab, 2020). Note that APPLES-JC and EPA-ng are not designed for discordant placement using a single gene, and INSTRAL is designed only for datasets with many genes (at least two but ideally many more).…”
Section: Resultsmentioning
confidence: 99%
“…Other hyperparameters are fixed to their defaults (Table S2) unless otherwise specified. DEPP is trained on the reference tree and is used to compute distances that are then fed to APPLES-II (Balaban, Jiang, et al, 2021). APPLES-II is used identically to APPLES-II+JC (see below).…”
Section: Methodsmentioning
confidence: 99%
“…APPLES-2 is an improvement on APPLES with respect to accuracy and running time, and also scales to at least 200 000 sequences. Recent studies [81,85,86] show that APPLES and APPLES-2 can run on trees with 200 000 leaves and are much faster than both pplacer and EPA-ng; however, even APPLES-2 does not match the accuracy of pplacer. UShER is parsimony-based and very fast, but has not been compared to pplacer, APPLES, or APPLES-2, while RAPPAS, which is based on k-mers, is very fast but not as accurate as EPA-ng or pplacer [83]).…”
Section: (A) Adding Sequences To Gene Treesmentioning
confidence: 99%
“…Other phylogenetic placement methods have been developed that seek to improve scalability to larger trees or reduce running time (e.g., UShER (Turakhia et al, 2021), RAPPAS (Linard et al, 2019), EPA-ng (Barbera et al, 2019), APPLES (Balaban et al, 2020), and APPLES-2 (Balaban et al, 2021)). EPA-ng is likelihood-based and has been optimized for "batch processing" of query sequences (so that the cost of performing phylogenetic placement of a large number of query sequences is much less than the cost of placing them one-by-one).…”
Section: Adding Sequences To Gene Treesmentioning
confidence: 99%
“…APPLES-2 is an improvement on APPLES with respect to accuracy and running time, and also scales to at least 200,000 sequences. Recent studies (Balaban et al, 2020(Balaban et al, , 2021Wedell et al, 2021) show that APPLES and APPLES-2 can run on trees with 200,000 leaves and are much faster than both pplacer and EPA-ng; however, even APPLES-2 does not match the accuracy of pplacer. UShER is parsimony-based and very fast, but has not been compared to pplacer, APPLES, or APPLES-2, while RAPPAS, which is based on k-mers, is very fast but not as accurate as EPA-ng or pplacer (Linard et al, 2019)).…”
Section: Adding Sequences To Gene Treesmentioning
confidence: 99%