Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.
Genome-based phylogeny plays a central role in the future taxonomy and phylogenetics of Bacteria and Archaea by replacing 16S rRNA gene phylogeny. The concatenated core gene alignments are frequently used for such a purpose. The bacterial core genes are defined as single-copy, homologous genes that are present in most of the known bacterial species. There have been several studies describing such a gene set, but the number of species considered was rather small. Here we present the up-to-date bacterial core gene set, named UBCG, and software suites to accommodate necessary steps to generate and evaluate phylogenetic trees. The method was successfully used to infer phylogenomic relationship of Escherichia and related taxa and can be used for the set of genomes at any taxonomic ranks of Bacteria. The UBCG pipeline and file viewer are freely available at https://www.ezbiocloud.net/tools/ubcg and https://www.ezbiocloud.net/tools/ubcg_viewer , respectively.
For more than a decade, pan-genome analysis has been applied as an effective method for explaining the genetic contents variation of prokaryotic species. However, genomic characteristics and detailed structures of gene pools have not been fully clarified, because most studies have used a small number of genomes. Here, we constructed pan-genomes of seven species in order to elucidate variations in the genetic contents of >27,000 genomes belonging to Streptococcus pneumoniae , Staphylococcus aureus subsp. aureus , Salmonella enterica subsp. enterica , Escherichia coli and Shigella spp., Mycobacterium tuberculosis complex, Pseudomonas aeruginosa , and Acinetobacter baumannii. This work showed the pan-genomes of all seven species has open property. Additionally, systematic evaluation of the characteristics of their pan-genome revealed that phylogenetic distance provided valuable information for estimating the parameters for pan-genome size among several models including Heaps’ law. Our results provide a better understanding of the species and a solution to minimize sampling biases associated with genome-sequencing preferences for pathogenic strains.
Shotgun metagenomics is of great importance in order to understand the composition of the microbial community associated with a sample and the potential impact it may exert on its host. For clinical metagenomics, one of the initial challenges is the accurate identification of a pathogen of interest and ability to single out that pathogen within a complex community of microorganisms. However, in absence of an accurate identification of those microorganisms, any kind of conclusion or diagnosis based on misidentification may lead to erroneous conclusions, especially when comparing distinct groups of individuals. When comparing a shotgun metagenomic sample against a reference genome sequence database, the classification itself is dependent on the contents of the database. Focusing on the genus Streptococcus, we built four synthetic metagenomic samples and demonstrated that shotgun taxonomic profiling using the bacterial core genes as the reference database performed better in both taxonomic profiling and relative abundance prediction than that based on the marker gene reference database included in MetaPhlAn2. Additionally, by classifying sputum samples of patients suffering from chronic obstructive pulmonary disease, we showed that adding genomes of genomospecies to a reference database offers higher taxonomic resolution for taxonomic profiling. Finally, we show how our genomospecies database is able to identify correctly a clinical stool sample from a patient with a streptococcal infection, proving that genomospecies provide better taxonomic coverage for metagenomic analyses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.