2023
DOI: 10.1101/2023.11.06.565843
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast, lightweight, and accurate metagenomic functional profiling using FracMinHash sketches

Mahmudur Rahman Hera,
Shaopeng Liu,
Wei Wei
et al.

Abstract: MotivationFunctional profiling of metagenomic samples is essential to decipher the functional capabilities of these microbial communities. Traditional and more widely used functional profilers in the context of metagenomics rely on aligning reads against a known reference database. However, aligning sequencing reads against a large and fast-growing database is computationally expensive. In general,k-mer-based sketching techniques have been successfully used in metagenomics to address this bottleneck, notably i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 100 publications
(127 reference statements)
0
4
0
Order By: Relevance
“… [249] BBMap [250] , Bowtie2 [251] , [252] , BWA [253] , [254] , [255] , iMOKA [256] , MiniMap2 [71] Taxonomic Classification In sequence composition-based methods, the frequency and distribution of k-mers in metagenomic data are analyzed to assess genome similarity across various taxonomic ranks. [257] , [258] ARK [259] , BinDash [122] , Bracken [260] , CDKAM [261] , CLARK [262] , Dashing [124] , fmh-funprofiler [128] , Genometa [263] , Kaiju [138] , KMCP [264] , KmerFinder [265] , Kraken2 [136] , KrakenUniq [19] , LMAT [266] , Mash [72] , Mash Screen [34] , Matchtigs [267] , MetaCache [268] , MetaPalette [269] , MetaProFi [50] , NIQKI [126] , SEK [270] , StrainSeeker [271] , SuperSampler [127] , TACOA [272] , Taxonomer [273] , TETRA [274] , VirFinder [18] , WGSQuikr [275] Phylogeny Reconstruction Pairwise evolutionary distances between protein or nucleic acid sequences and phylogenetic distances can be estimated from the number of k-mer matches between two sequences. Alignment-free sequence comparison quantifies distance using the decay of the number of k-mer matches between two sequences and compares the results to known phylogenetic trees.…”
Section: Applications Of K-mersmentioning
confidence: 99%
See 1 more Smart Citation
“… [249] BBMap [250] , Bowtie2 [251] , [252] , BWA [253] , [254] , [255] , iMOKA [256] , MiniMap2 [71] Taxonomic Classification In sequence composition-based methods, the frequency and distribution of k-mers in metagenomic data are analyzed to assess genome similarity across various taxonomic ranks. [257] , [258] ARK [259] , BinDash [122] , Bracken [260] , CDKAM [261] , CLARK [262] , Dashing [124] , fmh-funprofiler [128] , Genometa [263] , Kaiju [138] , KMCP [264] , KmerFinder [265] , Kraken2 [136] , KrakenUniq [19] , LMAT [266] , Mash [72] , Mash Screen [34] , Matchtigs [267] , MetaCache [268] , MetaPalette [269] , MetaProFi [50] , NIQKI [126] , SEK [270] , StrainSeeker [271] , SuperSampler [127] , TACOA [272] , Taxonomer [273] , TETRA [274] , VirFinder [18] , WGSQuikr [275] Phylogeny Reconstruction Pairwise evolutionary distances between protein or nucleic acid sequences and phylogenetic distances can be estimated from the number of k-mer matches between two sequences. Alignment-free sequence comparison quantifies distance using the decay of the number of k-mer matches between two sequences and compares the results to known phylogenetic trees.…”
Section: Applications Of K-mersmentioning
confidence: 99%
“…However, Kmer-db is roughly 26 times faster than Mash and is subsequently better equipped to process larger datasets [121] . Additional sketch-based methods that utilize k-mers in comparative genomics include Bindash 1.0 [122] and 2.0 [123] , Dashing 1.0 [124] and 2.0 [125] , NIQKI [126] , SuperSampler [127] , and fmh-funprofiler [128] ( Table 2 ).…”
Section: Applications Of K-mersmentioning
confidence: 99%
“…Each of these samples were converted into a functional profile in the form of a probability vector indexed by KOs, representing the abundances of the KOs in the sample. This is done using FracMinHash, [9] a sketch-based pipeline that uses sourmash methods to estimate the abundance of each KO present in each sample. The details for this process can be found in Supplement Section 3.2.…”
Section: Functional Comparison Among Body Sitesmentioning
confidence: 99%
“…This procedure can be adapted with the help of branch lengths assignment to answer a different yet equally meaningful question in metagenomic studies: the difference in functions that microbial communities are capable of performing in two given environments regardless of the actual organisms that carry out those functions. To do so, instead of clustering DNA into OTUs, we clustered them into functional units of orthologous genes through a process called functional profiling [9]. Next, instead of a phylogenetic tree, we obtained the KEGG Orthology (KO) hierarchy from the KEGG database [13,11,12].…”
Section: Introductionmentioning
confidence: 99%