2019
DOI: 10.1101/771964
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy

Abstract: We recently introduced the Genome Taxonomy Database (GTDB), a phylogenetically consistent, genome-based taxonomy providing rank normalized classifications for nearly 150,000 genomes from domain to genus. However, nearly 40% of the genomes used to infer the GTDB reference tree lack a species name, reflecting the large number of genomes in public repositories without complete taxonomic assignments. Here we address this limitation by proposing 24,706 species clusters which encompass all publicly available bacteri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
63
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(65 citation statements)
references
References 63 publications
0
63
0
2
Order By: Relevance
“…In those studies, genomes most commonly shared either Ͼ97% or Ͻ90% average nucleotide identity (ANI). A bacterial species threshold of 95% ANI, originally proposed on the basis of benchmarking with respect to DNA-DNA hybridization values (8), has been gaining increasing support (11) on the basis of that observation. However, it is still unclear whether this pattern is confounded by database biases or whether it reflects a true phenomenon across natural environments, as comparisons of phylogenetically unbalanced genome sets could result in the formation of spurious sequence clusters.…”
mentioning
confidence: 99%
“…In those studies, genomes most commonly shared either Ͼ97% or Ͻ90% average nucleotide identity (ANI). A bacterial species threshold of 95% ANI, originally proposed on the basis of benchmarking with respect to DNA-DNA hybridization values (8), has been gaining increasing support (11) on the basis of that observation. However, it is still unclear whether this pattern is confounded by database biases or whether it reflects a true phenomenon across natural environments, as comparisons of phylogenetically unbalanced genome sets could result in the formation of spurious sequence clusters.…”
mentioning
confidence: 99%
“…Genomes were classified using the standalone GTDB-Tk tool (version 0.3.2) using the classify workflow and the Genome Taxonomy Database version 89 [24]. The tool appears unable to classify genomes less than 10% complete, as indicated in the respective tables.…”
Section: Methodsmentioning
confidence: 99%
“…Genomes carrying fewer than 6 genes (cumulative) in these two categories were collapsed. Phylogeny was inferred using JolyTree [24] with a sketch size of 5 000. Conserved genomes are indicated with an asterisk within highlighted taxonomic groups.…”
mentioning
confidence: 99%
“…GTDBtk (version 0.3.2) [70] was used to place the bacterial genomes of LMO6, LMO8 and LMO9 in the Genome Taxonomy Database [71] tree, which was conducted through analysis of a set of marker genes, and calculate shared average nucleotide identity with the reference genomes. The tree was visualised with Dendroscope (version 3.5.9) [72].…”
Section: Phylogenetic Analysismentioning
confidence: 99%
“…Phylogenomic placement of the three strains from this study-LMO6, LMO8, and LMO9-in the GTDB phylogeny[71]. The three strains were placed with GTDBtk[70] and here visualised together with their closest Flavobacterium spp.…”
mentioning
confidence: 99%