High animal and plant richness in tropical rainforest communities has long intrigued naturalists. It is unknown if similar hyperdiversity patterns are reflected at the microbial scale with unicellular eukaryotes (protists). Here we show, using environmental metabarcoding of soil samples and a phylogeny-aware cleaning step, that protist communities in Neotropical rainforests are hyperdiverse and dominated by the parasitic Apicomplexa, which infect arthropods and other animals. These host-specific parasites potentially contribute to the high animal diversity in the forests by reducing population growth in a density-dependent manner. By contrast, too few operational taxonomic units (OTUs) of Oomycota were found to broadly drive high tropical tree diversity in a host-specific manner under the Janzen-Connell model. Extremely high OTU diversity and high heterogeneity between samples within the same forests suggest that protists, not arthropods, are the most diverse eukaryotes in tropical rainforests. Our data show that protists play a large role in tropical terrestrial ecosystems long viewed as being dominated by macroorganisms.S ince the works of early naturalists such as von Humboldt and Bonpland 1 , we have known that animal and plant communities in tropical rainforests are exceedingly species rich. For example, one hectare can contain more than 400 tree species 2 and one tree can harbour more than 40 ant species 3 . This hyperdiversity of trees has been partially explained by the Janzen-Connell model 4,5 , which hypothesizes that host-specific predators and parasites reduce plant population growth in a density-dependent manner 6,7 . Sampling up in the tree canopies and below on the ground has further led to the view that arthropods are the most diverse eukaryotes in tropical rainforests 8,9 .The focus on eukaryotic macroorganisms in these studies is primarily because they are familiar and readily observable to us. We do not know whether the less familiar and less readily observable protists-microbial eukaryotes that are not animals, plants or fungi 10 -inhabiting these same ecosystems exhibit similar diversity patterns. To evaluate if macroorganismic diversity patterns are reflected at the microbial scale with protists, we conducted an environmental DNA metabarcoding study by sampling soils in 279 locations in a variety of lowland Neotropical forest types in La Selva Biological Station, Costa Rica, Barro Colorado Island, Panama and Tiputini Biodiversity Station, Ecuador. This metabarcoding approach has the power to uncover known and new taxa on a massive scale 11 . By amplifying DNA extracted from the soils with broadly targeted primers for the V4 region of 18S rRNA and sequencing it using the Illumina MiSeq platform, we were able to detect most eukaryotic lineages, and assess the diversity and relative dominance of free-living and parasitic lineages.
Next Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the Evolutionary Placement Algorithm (EPA) included in RAxML, or pplacer, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Here we present EPA-ng, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA and pplacer. EPA-ng can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-ng we placed 1 billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3;748 taxa in just under 7 hours, using 2;048 cores. Our performance assessment shows that EPA-ng outperforms RAxML-EPA and pplacer by up to a factor of 30 in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-ng scales well up to 2;048 cores. EPA-ng is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng.
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8, 736 out of all 16, 453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into sub-classes using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.
Next Generation Sequencing (NGS) technologies have led to a ubiquity of 13 molecular sequence data. This data avalanche is particularly challenging in metagenetics, 14 which focuses on taxonomic identification of sequences obtained from diverse microbial 15 environments. To achieve this, phylogenetic placement methods determine how these 16 sequences fit into an evolutionary context. Previous implementations of phylogenetic 17 placement algorithms, such as the Evolutionary Placement Algorithm (EPA) included in 18 RAxML, or pplacer, are being increasingly used for this purpose. However, due to the 19 steady progress in NGS technologies, the current implementations face substantial 20 scalability limitations. Here we present EPA-ng, a complete reimplementation of the EPA 21 that is substantially faster, offers a distributed memory parallelization, and integrates 22 concepts from both, RAxML-EPA, and pplacer. EPA-ng can be executed on standard 23 shared memory, as well as on distributed memory systems (e.g., computing clusters). To 24 demonstrate the scalability of EPA-ng we placed 1 billion metagenetic reads from the 25 Tara Oceans Project onto a reference tree with 3,748 taxa in just under 7 hours, using 26 2,048 cores. Our performance assessment shows that EPA-ng outperforms RAxML-EPA 27 and pplacer by up to a factor of 30 in sequential execution mode, while attaining 28 comparable parallel efficiency on shared memory systems. We further show that the 29 distributed memory parallelization of EPA-ng scales well up to 3,520 cores. EPA-ng is 30 available under the AGPLv3 license: https://github.com/Pbdas/epa-ng 31 (Keywords: phylogenetics; phylogenetic placement; metagenomics; metabarcoding; 32 microbiome) 33 In the last decade, advances in genetic sequencing technologies have drastically 34 reduced the price for decoding DNA and dramatically increased the amount of available 35 DNA data. The Tara Oceans Project (Sunagawa et al. 2015), for example, has generated 36 hundreds of billions of environmental sequences. Moreover, sequencing costs are decreasing 37at a significantly higher rate than computers are becoming faster according to Moore's law. 38Therefore, state-of-the art Bioinformatics software is facing a grand scalability challenge. 39A common metagenetic data analysis step is to infer the microbiological 40 composition of a given sample. This can be done, for instance, by determining the best hit 41 for each query sequence (QS) in a database of reference sequences (RSs), using sequence 42 similarity measures, and by subsequently assigning the taxonomic label of the chosen RS to 43 the QS. However, approaches based on sequence similarity do neither provide, nor use, 44 phylogenetic information about the QS. This can decrease identification accuracy (Koski 45 and Golding 2001), especially when the QSs are only distantly related to the RSs, for 46 example when more closely related QS are simply not available. 47 Phylogenetic placement algorithms alleviate this problem by placing a QS onto a 48 reference t...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.