2023
DOI: 10.1093/bioinformatics/btad101
|View full text |Cite
|
Sign up to set email alerts
|

MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants

Abstract: Motivation Bloom filters are a popular data structure that allows rapid searches in large sequence datasets. So far, all tools work with nucleotide sequences; however, protein sequences are conserved over longer evolutionary distances, and only mutations on the protein level may have any functional significance. Results We present MetaProFi, a Bloom filter-based tool that, for the first time, offers the functionality to build… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 15 publications
0
9
0
Order By: Relevance
“…We evaluated the performances of kmindex together with eight state-of-the-art k-mer indexers: themisto [2]; ggcat [7]; HIBF [18]; PAC [17]; MetaProFi [22]; MetaGraph [13]; Bifrost [12]; and COBS [3]. The dataset for this benchmark is composed of metagenomic seawater sequencing data from 50 Tara Oceans samples, of 1.4TB of gzipped fastq files.…”
Section: Comparative Results Indexing 50 Metagenomic Seawater Samplesmentioning
confidence: 99%
See 2 more Smart Citations
“…We evaluated the performances of kmindex together with eight state-of-the-art k-mer indexers: themisto [2]; ggcat [7]; HIBF [18]; PAC [17]; MetaProFi [22]; MetaGraph [13]; Bifrost [12]; and COBS [3]. The dataset for this benchmark is composed of metagenomic seawater sequencing data from 50 Tara Oceans samples, of 1.4TB of gzipped fastq files.…”
Section: Comparative Results Indexing 50 Metagenomic Seawater Samplesmentioning
confidence: 99%
“…We evaluated the performance of kmindex together with state-of-the-art k-mer indexers MetaGraph [11], MetaProFi [20], and PAC [16]. We first indexed raw metagenomic seawater sequencing data from 50 Tara Oceans samples, composed of 1.4TB of gzipped fastq files.…”
Section: Comparative Results Indexing 50 Metagenomic Seawater Samplesmentioning
confidence: 99%
See 1 more Smart Citation
“…Here, the graph construction is the main limitation of the methods. Other tools allow false-positive results by using Approximate Membership Queries (AMQ) data structures to enhance space efficiency [6, 4, 20, 14, 27, 16]. They all use trade-offs between size and false-positive rate.…”
Section: Introductionmentioning
confidence: 99%
“…Several approaches utilize a specialized data structure for information retrieval known as a Bloom filter. FACS [27] uses a Bloom filter to classify DNA sequences. MetaProFi [28] uses a Bloom filter to build indexes of amino acid sequences to provide a memory-efficient and storage-efficient solution for protein sequence comparison.…”
Section: Introductionmentioning
confidence: 99%