2023
DOI: 10.1101/2023.01.18.524587
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast and robust metagenomic sequence comparison through sparse chaining with skani

Abstract: Sequence comparison algorithms for metagenome-assembled genomes (MAGs) often have difficulties dealing with data that is high-volume or low-quality. We present skani, a method for calculating average nucleotide identity (ANI) using sparse approximate alignments. skani is more accurate than FastANI for comparing incomplete, fragmented MAGs while also being > 20 times faster. For searching a database of > 65,000 prokaryotic genomes, skani takes only seconds per query and 5 GB of memory. skani is a versatil… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(27 citation statements)
references
References 45 publications
1
20
0
Order By: Relevance
“…Similarly, mutation rates (or ANI) estimated by FracMinHash, CMash and related tools (e.g. Sourmash or skani calculated ANI) are also not metric (58)(59)(60). To solve this "metric" problem, a norm adjusted proximity graph (NAPG) was proposed based on inner product and it shows improvements in terms of both speed and recall using non-metric distances (61).…”
Section: Discussionmentioning
confidence: 99%
“…Similarly, mutation rates (or ANI) estimated by FracMinHash, CMash and related tools (e.g. Sourmash or skani calculated ANI) are also not metric (58)(59)(60). To solve this "metric" problem, a norm adjusted proximity graph (NAPG) was proposed based on inner product and it shows improvements in terms of both speed and recall using non-metric distances (61).…”
Section: Discussionmentioning
confidence: 99%
“…The versions and commands used are summarized in Supplementary Table 1. HyperGen uses k-mer size k = 21, scaled factor S = 1500 as suggested in previous works (Shaw and Yu, 2023;Hera et al, 2023;Brown and Irber, 2016). Our analysis in Section 3.2.1 shows that the HV dimension D = 4096 achieves a good balance between ANI estimation error and sketching complexity.…”
Section: Benchmarking Toolsmentioning
confidence: 96%
“…For example, Skani needs to store indexing files with a storage size comparable to the original dataset. FastANI encounters out-of-memory issues on large datasets as reported in (Shaw and Yu, 2023).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We downloaded four sets of complete references from NCBI: five E. coli strains (with pairwise ANI ranging from 98.4% to 99.5%), five S. aureus strains (ANI ranging from 98.5% to 99.9%), five L. monocytogenes (ANI from 98.6% to 99.8%) and five P. aeruginosa (ANI from 98.7% to 99.5%). Pairwise ANIs were computed using skani (Shaw and Yu 2023). For each bacterial species, we created a benchmarking set with 2-5 strains, with both uniform read depth (30x) or linearly decreasing "staggered" depth (e.g.…”
Section: Overview Of the Strainy Approachmentioning
confidence: 99%