2022
DOI: 10.1038/s41592-022-01431-4
|View full text |Cite
|
Sign up to set email alerts
|

Critical Assessment of Metagenome Interpretation: the second round of challenges

Abstract: Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

7
237
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 212 publications
(244 citation statements)
references
References 74 publications
7
237
0
Order By: Relevance
“…These previous studies have also used databases composed of only bacteria [1], only bacterial, archaeal, and viral genomes with complete assemblies [3], a MiniKraken database [1], or do not give full details on what is included in their database [2,4,5,7,9,17,18]. Additionally, while some other studies have used the NCBI non-redundant nucleotide database [6,8], we are not aware of any studies that have used the full NCBI RefSeq database that we have here, which is likely to have led to significantly worse performance in those previous comparisons (Fig. 4).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…These previous studies have also used databases composed of only bacteria [1], only bacterial, archaeal, and viral genomes with complete assemblies [3], a MiniKraken database [1], or do not give full details on what is included in their database [2,4,5,7,9,17,18]. Additionally, while some other studies have used the NCBI non-redundant nucleotide database [6,8], we are not aware of any studies that have used the full NCBI RefSeq database that we have here, which is likely to have led to significantly worse performance in those previous comparisons (Fig. 4).…”
Section: Discussionmentioning
confidence: 99%
“…Due to the samples that were constructed by both the CAMI ( n =10) [5] and CAMI2 ( n =180) [6] studies using newly sequenced genomes, not all genomes – and therefore not all reads – have classifications at all taxonomic ranks. For some genomes, the lowest rank of the taxonomic classification given is at the family or genus level.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast, alignment-based methods show high tolerance for base variation. Marker gene approaches perform well in the identification of archaea and bacteria (Meyer et al ., 2022), while it is difficult to accurately identify viruses with this strategy since viruses do not have universally conserved genes, such as the 16S and 18S rRNA genes (Breitwieser et al ., 2019).…”
Section: Introductionmentioning
confidence: 99%
“…These assemblers include meta-IDBA (Peng et al, 2011), metaSPAdes (Nurk et al, 2017), MEGAHIT (Li et al, 2016), and many others. Several recent studies have provided a comprehensive comparison of the computational performance and accuracy of these assemblers (Sczyrba et al, 2017; Vollmers et al, 2017; Meyer et al, 2021). While most of these assemblers can efficiently take advantage of the modern CPU’s multiple processing capabilities, they are limited on a single computer node and, therefore, are not able to assemble very large datasets due to the limited memory capacity.…”
Section: Introductionmentioning
confidence: 99%