The recent advent of DNA sequencing technologies facilitates the use of genome sequencing data that provide means for more informative and precise classification and identification of members of the Bacteria and Archaea. Because the current species definition is based on the comparison of genome sequences between type and other strains in a given species, building a genome database with correct taxonomic information is of paramount need to enhance our efforts in exploring prokaryotic diversity and discovering novel species as well as for routine identifications. Here we introduce an integrated database, called EzBioCloud, that holds the taxonomic hierarchy of the Bacteria and Archaea, which is represented by quality-controlled 16S rRNA gene and genome sequences. Whole-genome assemblies in the NCBI Assembly Database were screened for low quality and subjected to a composite identification bioinformatics pipeline that employs gene-based searches followed by the calculation of average nucleotide identity. As a result, the database is made of 61 700 species/phylotypes, including 13 132 with validly published names, and 62 362 whole-genome assemblies that were identified taxonomically at the genus, species and subspecies levels. Genomic properties, such as genome size and DNA G+C content, and the occurrence in human microbiome data were calculated for each genus or higher taxa. This united database of taxonomy, 16S rRNA gene and genome sequences, with accompanying bioinformatics tools, should accelerate genome-based classification and identification of members of the Bacteria and Archaea. The database and related search tools are available at www.ezbiocloud.net/.
Average nucleotide identity (ANI) is a category of computational analysis that can be used to define species boundaries of Archaea and Bacteria. Calculating ANI usually involves the fragmentation of genome sequences, followed by nucleotide sequence search, alignment, and identity calculation. The original algorithm to calculate ANI used the BLAST program as its search engine. An improved ANI algorithm, called OrthoANI, was developed to accommodate the concept of orthology. Here, we compared four algorithms to compute ANI, namely ANIb (ANI algorithm using BLAST), ANIm (ANI using MUMmer), OrthoANIb (OrthoANI using BLAST) and OrthoANIu (OrthoANI using USEARCH) using >100,000 pairs of genomes with various genome sizes. By comparing values to the ANIb that is considered a standard, OrthoANIb and OrthoANIu exhibited good correlation in the whole range of ANI values. ANIm showed poor correlation for ANI of <90%. ANIm and OrthoANIu runs faster than ANIb by an order of magnitude. When genomes that are larger than 7 Mbp were analysed, the run-times of ANIm and OrthoANIu were shorter than that of ANIb by 53- and 22-fold, respectively. In conclusion, ANI calculation can be greatly sped up by the OrthoANIu method without losing accuracy. A web-service that can be used to calculate OrthoANIu between a pair of genome sequences is available at http://www.ezbiocloud.net/tools/ani . For large-scale calculation and integration in bioinformatics pipelines, a standalone JAVA program is available for download at http://www.ezbiocloud.net/tools/orthoaniu .
Genome-based phylogeny plays a central role in the future taxonomy and phylogenetics of Bacteria and Archaea by replacing 16S rRNA gene phylogeny. The concatenated core gene alignments are frequently used for such a purpose. The bacterial core genes are defined as single-copy, homologous genes that are present in most of the known bacterial species. There have been several studies describing such a gene set, but the number of species considered was rather small. Here we present the up-to-date bacterial core gene set, named UBCG, and software suites to accommodate necessary steps to generate and evaluate phylogenetic trees. The method was successfully used to infer phylogenomic relationship of Escherichia and related taxa and can be used for the set of genomes at any taxonomic ranks of Bacteria. The UBCG pipeline and file viewer are freely available at https://www.ezbiocloud.net/tools/ubcg and https://www.ezbiocloud.net/tools/ubcg_viewer , respectively.
Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.
Airborne microorganisms have significant effects on human health, and children are more vulnerable to pathogens and allergens than adults. However, little is known about the microbial communities in the air of childcare facilities. Here, we analyzed the bacterial and fungal communities in 50 air samples collected from five daycare centers and five elementary schools located in Seoul, Korea using culture-independent high-throughput pyrosequencing. The microbial communities contained a wide variety of taxa not previously identified in child daycare centers and schools. Moreover, the dominant species differed from those reported in previous studies using culture-dependent methods. The well-known fungi detected in previous culture-based studies (Alternaria, Aspergillus, Penicillium, and Cladosporium) represented less than 12% of the total sequence reads. The composition of the fungal and bacterial communities in the indoor air differed greatly with regard to the source of the microorganisms. The bacterial community in the indoor air appeared to contain diverse bacteria associated with both humans and the outside environment. In contrast, the fungal community was largely derived from the surrounding outdoor environment and not from human activity. The profile of the microorganisms in bioaerosols identified in this study provides the fundamental knowledge needed to develop public health policies regarding the monitoring and management of indoor air quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.