2021
DOI: 10.1016/j.isci.2021.102782
|View full text |Cite
|
Sign up to set email alerts
|

Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences

Abstract: HighlightsRaptor is a tool to search through large collections of genomic texts Raptor is 12-144 times faster and uses up to 30 times less RAM than COBS or MantisThe Raptor index is 6-50 times faster to build The use of minimizers and Bloom filters makes Raptor very spaceefficient

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

5
1

Authors

Journals

citations
Cited by 8 publications
(43 citation statements)
references
References 20 publications
0
43
0
Order By: Relevance
“…The runtimes were initially significantly improved by the Patro group with the tool Mantis in [18] and by the Iqbal group with COBS [3]. This year, the Reinert lab introduced the IBF [23], which has proven to be a significant step towards a very time and space efficient in-memory data structure for preprocessing approximate sequence queries, which opens up many possible applications. It improves in runtime by a factor of 12-144 over its competitors COBS and Mantis.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The runtimes were initially significantly improved by the Patro group with the tool Mantis in [18] and by the Iqbal group with COBS [3]. This year, the Reinert lab introduced the IBF [23], which has proven to be a significant step towards a very time and space efficient in-memory data structure for preprocessing approximate sequence queries, which opens up many possible applications. It improves in runtime by a factor of 12-144 over its competitors COBS and Mantis.…”
Section: Related Workmentioning
confidence: 99%
“…as a bit mask. Like Seiler et al [23], it uses minimizers to reduce the number of 𝑘-mers to be queried and thus the number of costly memory accesses. We decided on the Intel FPGA SDK for OpenCL version 2021.3 as the implementation environment, as it offers a high-level programming model with an acceptable overhead and encapsulates the entire host interaction in a well-known API, which allowed us to focus on the algorithmic optimizations of the problem.…”
Section: Count(p )mentioning
confidence: 99%
See 3 more Smart Citations