2023
DOI: 10.1101/gr.277645.123
|View full text |Cite
|
Sign up to set email alerts
|

Entropy predicts sensitivity of pseudorandom seeds

Abstract: Seed design is important for sequence similarity search applications such as read mapping and average nucleotide identity (ANI) estimation. Whilek-mers and spacedk-mers are likely the most well-known and used seeds, sensitivity suffers at high error rates, particularly when indels are present. Recently, we developed a pseudo-random seeding construct, strobemers, which were empirically demonstrated to have high sensitivity also at high indel rates. However, the study lacked a deeper understanding of why. In thi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(11 citation statements)
references
References 60 publications
0
11
0
Order By: Relevance
“…For the runtime, we evaluated randstrobes parametrized as ( n = 2, l = 20, w min = 21, w max = 100) and ( n = 2, l = 20, w min = 21, w max = 1000) since the window size affects runtime. Strobemers with n > 3 show no substantial gain in the context of sequence matching at the cost of additional runtime [12](although they have been modified and used for specific scenarios [8]). Also, the relative performance can be extrapolated from the n = 2 and n = 3 cases, since the construction is recursive, therefore, we omit them in this study.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…For the runtime, we evaluated randstrobes parametrized as ( n = 2, l = 20, w min = 21, w max = 100) and ( n = 2, l = 20, w min = 21, w max = 1000) since the window size affects runtime. Strobemers with n > 3 show no substantial gain in the context of sequence matching at the cost of additional runtime [12](although they have been modified and used for specific scenarios [8]). Also, the relative performance can be extrapolated from the n = 2 and n = 3 cases, since the construction is recursive, therefore, we omit them in this study.…”
Section: Resultsmentioning
confidence: 99%
“…The methods to select strobes differ [18], and using alternating strobe lengths has also been explored [12]. However, randstrobes were shown to be more sensitive for sequence matching than other methods using fixed strobe lengths (minstrobes and hybridstrobes) [18], and simpler to construct than alternating strobe lengths (altstrobes and multistrobes) [12], and is so far most commonly implemented in practice, e.g., [20,15,23]. Therefore, we will consider only the randstrobes method in this study.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations