PUmPER: phylogenies updated perpetually

Izquierdo-Carrasco, Fernando; Cazes, John; Smith, Stephen A.; Stamatakis, Alexandros

doi:10.1093/bioinformatics/btu053

Cited by 22 publications

(21 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…() developed PHLAWD to conduct a so‐called “baited” analysis where gene regions may be identified a priori thereby dramatically speeding up clustering analyses. This procedure was extended with PUmPER to allow for automatic updating when new sequences become available (Izquierdo‐Carrasco et al., ). Several newly developed software packages have built on these methods including SUMAC (Freyman, ) that incorporates both “baited” analyses and single‐linkage clustering methods as well as a novel means of determining when there are enough overlapping data, and SUPERSMART (Antonelli et al., ) that includes analyses from clustering to divergence‐time estimation.…”

mentioning

confidence: 99%

Constructing a broadly inclusive seed plant phylogeny

Smith

Brown

2018

American J of Botany

Self Cite

801

785

View full text Add to dashboard Cite

PREMISE OF THE STUDY:Large phylogenies can help shed light on macroevolutionary patterns that inform our understanding of fundamental processes that shape the tree of life. These phylogenies also serve as tools that facilitate other systematic, evolutionary, and ecological analyses. Here we combine genetic data from public repositories (GenBank) with phylogenetic data (Open Tree of Life project) to construct a dated phylogeny for seed plants. METHODS:We conducted a hierarchical clustering analysis of publicly available molecular data for major clades within the Spermatophyta. We constructed phylogenies of major clades, estimated divergence times, and incorporated data from the Open Tree of Life project, resulting in a seed plant phylogeny. We estimated diversification rates, excluding those taxa without molecular data. We also summarized topological uncertainty and data overlap for each major clade. KEY RESULTS:The trees constructed for Spermatophyta consisted of 79,881 and 353,185 terminal taxa; the latter included the Open Tree of Life taxa for which we could not include molecular data from GenBank. The diversification analyses demonstrated nested patterns of rate shifts throughout the phylogeny. Data overlap and inference uncertainty show significant variation throughout and demonstrate the continued need for data collection across seed plants.CONCLUSIONS: This study demonstrates a means for combining available resources to construct a dated phylogeny for plants. However, this approach is an early step and more developments are needed to add data, better incorporating underlying uncertainty, and improve resolution. The methods discussed here can also be applied to other major clades in the tree of life.

show abstract

mentioning

confidence: 99%

Constructing a broadly inclusive seed plant phylogeny

Smith

Brown

2018

American J of Botany

Self Cite

801

785

View full text Add to dashboard Cite

show abstract

“…Pumper [29], which allows an automatic sequence retrieval and tree building, and Sativa [30], which automatically annotates sequences in a tree. Both depend on the quality of the initial input, highlighting the need for high-quality initial annotation as implemented in EukRef.…”

Section: The Eukref Curation Processmentioning

confidence: 99%

EukRef: phylogenetic curation of ribosomal RNA to enhance understanding of eukaryotic diversity and distribution

Campo

Kolísko

Boscaro

et al. 2018

Preprint

View full text Add to dashboard Cite

Environmental sequencing has greatly expanded our knowledge of micro-eukaryotic diversity and ecology by revealing previously unknown lineages and their distribution. However, the value of these data is critically dependent on the quality of the reference databases used to assign an identity to environmental sequences. Existing databases contain errors, and struggle to keep pace with rapidly changing eukaryotic taxonomy, the influx of novel diversity, and computational challenges related to assembling the high-quality alignments and trees needed for accurate characterization of lineage diversity. EukRef (eukref.org) is a community driven initiative that addresses these challenges by bringing together taxonomists with expertise spanning the complete eukaryotic tree of life and microbial ecologists that actively use environmental sequencing data for the purpose of developing reliable reference databases across the diversity of microbial eukaryotes. EukRef organizes and facilitates rigorous sequence data mining and annotation by providing protocols, guidelines and tools to do so.

show abstract

“…An increasing body of work mainly targets such taxonomic identification methods, with recent developments confronting the increasing scalability issues associated with the high dimensions of modern data sets (Barbera et al 2019;Czech et al 2018). Izquierdo-Carrasco et al (2014) implemented an online framework to estimate phylogenetic trees using maximum-likelihood heuristics, which automatically extends an existing alignment when sufficiently new data have been generated and subsequently reconstructs extended phylogenetic trees by using previously inferred smaller trees as starting topologies. The authors compared their methodology to de novo phylogenetic reconstruction and found a slight but consistent improvement in computational performance and a similar topological accuracy.…”

Section: Introductionmentioning

confidence: 99%

Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction

Gill

Lemey

Suchard

et al. 2020

Molecular Biology and Evolution

View full text Add to dashboard Cite

Reconstructing pathogen dynamics from genetic data as they become available during an outbreak or epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an 'online' fashion. Widely-used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees and evolutionary model parameters de novo when new data arrive. To accommodate increasing data flow in a Bayesian phylogenetic framework, we introduce a methodology to efficiently update the posterior distribution with newly available genetic data. Our procedure is implemented in the BEAST 1.10 software package, and relies on a distance-based measure to insert new taxa into the current estimate of the phylogeny and imputes plausible values for new model parameters to accommodate growing dimensionality. This augmentation creates informed starting values and re-uses optimally tuned transition kernels for posterior exploration of growing data sets, reducing the time necessary to converge to target posterior distributions. We apply our framework to data from the recent West African Ebola virus epidemic and demonstrate a considerable reduction in time required to obtain posterior estimates at different time points of the outbreak. Beyond epidemic monitoring, this framework easily finds other applications within the phylogenetics community, where changes in the data -in terms of alignment changes, sequence addition or removalpresent common scenarios that can benefit from online inference.

show abstract

PUmPER: phylogenies updated perpetually

Cited by 22 publications

References 5 publications

Constructing a broadly inclusive seed plant phylogeny

Constructing a broadly inclusive seed plant phylogeny

EukRef: phylogenetic curation of ribosomal RNA to enhance understanding of eukaryotic diversity and distribution

Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction

Contact Info

Product

Resources

About