2017
DOI: 10.1371/journal.pcbi.1005777
|View full text |Cite
|
Sign up to set email alerts
|

Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing

Abstract: With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
69
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 56 publications
(69 citation statements)
references
References 19 publications
0
69
0
Order By: Relevance
“…Furthermore, there are applications where the k-mer set is not related to sequence read data at all, e.g. a universal hitting set [26], a chromosome-specific reference dictionary [27], or a winnowed min-hash sketch (for example as in [28], or see [29,30] for a survey).…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, there are applications where the k-mer set is not related to sequence read data at all, e.g. a universal hitting set [26], a chromosome-specific reference dictionary [27], or a winnowed min-hash sketch (for example as in [28], or see [29,30] for a survey).…”
Section: Related Workmentioning
confidence: 99%
“…Universal k-mer sets are central to the construction of orders with low density [11]. In fact, the proposed method, just like DOCKS [16,15], is a heuristics to construct universal sets.…”
Section: Universal Sets and Compatible Ordersmentioning
confidence: 99%
“…The problem of finding an optimal order, i.e., an order with the lowest possible density, is still open [13]. Orenstein et al [16] proposed a heuristic, DOCKS, that is used to create orders with low density. Unfortunately, this method has a compute time that is over exponential in k and is impractical for k ≥ 10.…”
Section: Introductionmentioning
confidence: 99%
“…The first one computes a set of k-mers that covers every path of length w in the de Bruijn graph (an extension of the set cover problem). This problem was studied in Orenstein et al (2017) and Marçais et al (2017), and this new algorithm gives an asymptotically optimal solution. The second algorithm gives the order between k-mers for the minimizers schemes in (I).…”
Section: Introductionmentioning
confidence: 99%