Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 2019
DOI: 10.1145/3307339.3342144
|View full text |Cite
|
Sign up to set email alerts
|

Practical Universal k-mer Sets for Minimizer Schemes

Abstract: Minimizer schemes have found widespread use in genomic applications as a way to quickly predict the matching probability of large sequences. Most methods for minimizer schemes use randomized (or close to randomized) ordering of k-mers when finding minimizers, but recent work has shown that not all non-lexicographic orderings perform the same. One way to find k-mer orderings for minimizer schemes is through the use of universal k-mer sets, which are subsets of k-mers that are guaranteed to cover all windows. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
30
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3
2

Relationship

6
2

Authors

Journals

citations
Cited by 28 publications
(31 citation statements)
references
References 20 publications
0
30
0
Order By: Relevance
“…The DOCKS [13] and ReMuVal [3] algorithms are heuristics to generate unavoidable sets for parameters k and L. Both of these algorithms use the Mykkeltveit set as a starting point. In many practical cases, the longest sequence that does not contain any k-mer from the Mykkeltveit set is much larger than the parameter L of interest (which for a compatible minimizers scheme correspond to the window length).…”
Section: Introductionmentioning
confidence: 99%
“…The DOCKS [13] and ReMuVal [3] algorithms are heuristics to generate unavoidable sets for parameters k and L. Both of these algorithms use the Mykkeltveit set as a starting point. In many practical cases, the longest sequence that does not contain any k-mer from the Mykkeltveit set is much larger than the parameter L of interest (which for a compatible minimizers scheme correspond to the window length).…”
Section: Introductionmentioning
confidence: 99%
“…The idea of learning minimizer schemes tailored towards a target sequence has been previously explored, although to a lesser extent. Current approaches include heuristic designs [1, 8], greedy pruning [2] and construction of k -mer sets that are well-spread on the target sequence [20]. However, these methods only learn crude approximations of π by dividing k -mers into disjoint subsets with different priorities to be selected.…”
Section: Introductionmentioning
confidence: 99%
“…The idea of constructing sequence sketches tailored to a specific sequence has been explored before (Chikhi et al ., 2015; DeBlasio et al ., 2019; Jain et al ., 2020b), but it remains less understood than the average case. Random sequences have nice properties that allow for simplified probabilistic analysis.…”
Section: Introductionmentioning
confidence: 99%