2022
DOI: 10.1101/2022.01.11.475838
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lightweight compositional analysis of metagenomes with FracMinHash and minimum metagenome covers

Abstract: The identification of reference genomes and taxonomic labels from metagenome data underlies many microbiome studies. Here we describe two algorithms for compositional analysis of metagenome sequencing data. We first investigate the FracMinHash sketching technique, a derivative of modulo hash that supports Jaccard containment estimation between sets of different sizes. We implement FracMinHash in the sourmash software, evaluate its accuracy, and demonstrate large-scale containment searches of metagenomes using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
79
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 51 publications
(91 citation statements)
references
References 59 publications
0
79
0
Order By: Relevance
“…We refer to FracMinHash sketches as sketches or k-mer abundance profiles, and for simplicity, continue referring to the sub-sampled k-mers in a sketch as k-mers . Retaining only k-mers associated with IBD, we used a minimum set cover approach to identify the genomes that best encompassed these k-mers [21].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We refer to FracMinHash sketches as sketches or k-mer abundance profiles, and for simplicity, continue referring to the sub-sampled k-mers in a sketch as k-mers . Retaining only k-mers associated with IBD, we used a minimum set cover approach to identify the genomes that best encompassed these k-mers [21].…”
Section: Resultsmentioning
confidence: 99%
“…Using these trimmed reads, we generated FracMinHash signatures for each library using sourmash (k-size 31, scaled 2000, abundance tracking on) [57]. FracMinHash sketching produces compressed representations of k-mers in a metagenome while retaining the sequence diversity in a sample [21,23]. This approach creates a consistent set of k-mers across samples by retaining the same k-mers when the same k-mers were observed.…”
Section: Methodsmentioning
confidence: 99%
“…We binned the resultant assemblies using metabat2 with parameter [60]. We assigned GTDB species to each bin using sourmash (DNA, k = 31, scaled = 2000) against the GTDB rs202 database, selecting the species of the best match [19]. We decontaminated each bin with charcoal using default parameters [61].…”
Section: Methodsmentioning
confidence: 99%
“…The ability to distinguish between species without alignment or assembly have popularized the use of k-mers for metagenome analysis, primarily through lightweight sketching and compact de Bruijn assembly graphs (cDBGs). Lightweight sketching facilitates fast and accurate sequence comparisons between potentially large data sets through random but consistent sub-sampling [18,19]. cDBGs maintain connectivity between k-mers and organize them into species-specific neighborhoods [20,21].…”
Section: Introductionmentioning
confidence: 99%
“…Sourmash (Irber et al, 2022) outperformed other tools in the Critical Assessment of Metagenome Interpretation (CAMI) based on mouse gut datasets (Meyer et al, 2019) using FracMinHash sketches with a scale of 10,000 calculated from all 141,677 prokaryotic genomes in a RefSeq snapshot. The sketching algorithm uses a small number of signatures from whole-genome sequences; therefore, it can index a large number of reference genomes.…”
Section: Introductionmentioning
confidence: 99%