2021
DOI: 10.1101/gr.275648.121
|View full text |Cite
|
Sign up to set email alerts
|

Effective sequence similarity detection with strobemers

Abstract: k-mer-based methods are widely used in bioinformatics for various types of sequence comparisons. However, a single mutation will mutate k consecutive k-mers and make most k-mer-based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, for example, spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods fir… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
193
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 61 publications
(193 citation statements)
references
References 71 publications
0
193
0
Order By: Relevance
“…The main idea of the seeding approach presented here is to first compute open syncmers (21) from the reference sequences, then link the syncmers together using the randstrobe method (22) with two strobes. The study introducing strobemers (22) described strobemers as linking together strobes in ‘sequence-space’, i.e ., over the set of all k-mers. Since syncmers represent a subset of k-mers from the original sequence, computing randstrobes over this subset of strings is very fast; it suffice to compare a smaller set of syncmers to produce the next strobe, while still having a similar range on the upper and lower window bounds on the original sequence.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…The main idea of the seeding approach presented here is to first compute open syncmers (21) from the reference sequences, then link the syncmers together using the randstrobe method (22) with two strobes. The study introducing strobemers (22) described strobemers as linking together strobes in ‘sequence-space’, i.e ., over the set of all k-mers. Since syncmers represent a subset of k-mers from the original sequence, computing randstrobes over this subset of strings is very fast; it suffice to compare a smaller set of syncmers to produce the next strobe, while still having a similar range on the upper and lower window bounds on the original sequence.…”
Section: Methodsmentioning
confidence: 99%
“…This means that s 1 , s 2 , and s ′ are syncmers, and we will let [ w min , w max ] refer to the lower and upper number of syncmers downstream from s 1 where we will sample s 2 from. A second modification to the strobemers as described in (22) is that we store the strobemer hash value from two strobes s 1 and s 2 as H ( s 1 , s 2 ) = v ( s 1 ) / 2 + v ( s 2 ) / 2. The hash function H is symmetric ( h ( v ( s 1 ), v ( s 2 )) = h ( v ( s 2 ), v ( s 1 ))) and together with canonical syncmers it produces the same hash value if the strobemer is created from forward and reverse complement direction.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations