Background Metagenomics caused a quantum leap in microbial ecology. However, the inherent size and complexity of metagenomic data limit its interpretation. The quantification of metagenomic traits in metagenomic analysis workflows has the potential to improve the exploitation of metagenomic data. Metagenomic traits are organisms’ characteristics linked to their performance. They are measured at the genomic level taking a random sample of individuals in a community. As such, these traits provide valuable information to uncover microorganisms’ ecological patterns. The Average Genome Size (AGS) and the 16S rRNA gene Average Copy Number (ACN) are two highly informative metagenomic traits that reflect microorganisms’ ecological strategies as well as the environmental conditions they inhabit. Results Here, we present the ags.sh and acn.sh tools, which analytically derive the AGS and ACN metagenomic traits. These tools represent an advance on previous approaches to compute the AGS and ACN traits. Benchmarking shows that ags.sh is up to 11 times faster than state-of-the-art tools dedicated to the estimation AGS. Both ags.sh and acn.sh show comparable or higher accuracy than existing tools used to estimate these traits. To exemplify the applicability of both tools, we analyzed the 139 prokaryotic metagenomes of TARA Oceans and revealed the ecological strategies associated with different water layers. Conclusion We took advantage of recent advances in gene annotation to develop the ags.sh and acn.sh tools to combine easy tool usage with fast and accurate performance. Our tools compute the AGS and ACN metagenomic traits on unassembled metagenomes and allow researchers to improve their metagenomic data analysis to gain deeper insights into microorganisms’ ecology. The ags.sh and acn.sh tools are publicly available using Docker container technology at https://github.com/pereiramemo/AGS-and-ACN-tools . Electronic supplementary material The online version of this article (10.1186/s12859-019-3031-y) contains supplementary material, which is available to authorized users.
Marine viruses are dominated by phages and have an enormous influence on microbial population dynamics, due to lysis and horizontal gene transfer. The aim of this study is to analyze the occurrence and diversity of phages in the North Sea, considering the virus-host interactions and biogeographic factors. The virus community of four sampling stations were described using virus metagenomics (viromes). The results show that the virus community was not evenly distributed throughout the North Sea. The dominant phage members were identified as unclassified phage group, followed by Caudovirales order. Myoviridae was the dominant phage family in the North Sea, which occurrence decreased from the coast to the open sea. In contrast, the occurrence of Podoviridae increased and the occurrence of Siphoviridae was low throughout the North Sea. The occurrence of other groups such as Phycodnaviridae decreased from the coast to the open sea. The coastal virus community was genetically more diverse than the open sea community. The influence of riverine inflow and currents, for instance the English Channel flow affects the genetic virus diversity with the community carrying genes from a variety of metabolic pathways and other functions. The present study offers the first insights in the virus community in the North Sea using viromes and shows the variation in virus diversity and the genetic information moved from coastal to open sea areas.
Microorganisms produce an immense variety of natural products through the expression of Biosynthetic Gene Clusters (BGCs): physically clustered genes that encode the enzymes of a specialized metabolic pathway. These natural products cover a wide range of chemical classes (e.g., aminoglycosides, lantibiotics, nonribosomal peptides, oligosaccharides, polyketides, terpenes) that are highly valuable for industrial and medical applications1. Metagenomics, as a culture-independent approach, has greatly enhanced our ability to survey the functional potential of microorganisms and is growing in popularity for the mining of BGCs. However, to effectively exploit metagenomic data to this end, it will be crucial to more efficiently identify these genomic elements in highly complex and ever-increasing volumes of data2. Here, we address this challenge by developing the ultrafast Biosynthetic Gene cluster MEtagenomic eXploration toolbox (BiG-MEx). BiG-MEx rapidly identifies a broad range of BGC protein domains, assess their diversity and novelty, and predicts the abundance profile of natural product BGC classes in metagenomic data. We show the advantages of BiG-MEx compared to standard BGC-mining approaches, and use it to explore the BGC domain and class composition of samples in the TARA Oceans3 and Human Microbiome Project datasets4. In these analyses, we demonstrate BiG-MEx’s applicability to study the distribution, diversity, and ecological roles of BGCs in metagenomic data, and guide the exploration of natural products with clinical applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.