2021
DOI: 10.1101/2021.06.18.449070
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Flexible seed size enables ultra-fast and accurate read alignment

Abstract: Short-read genome alignment is a fundamental computational step used in many bioinformatic analyses. It is therefore desirable to align such data as fast as possible. Most alignment algorithms consider a seed-and-extend approach. Several popular programs perform the seeding step based on the Burrows-Wheeler Transform with a low memory footprint, but they are relatively slow compared to more recent approaches that use a minimizer-based seeding-and-chaining strategy. Recently, syncmers and strobemers were propos… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 50 publications
(153 reference statements)
0
10
0
Order By: Relevance
“…It has been shown that strobemers allow for much higher conservation (called match-coverage in Sahlin, 2021a ) than k-mers. StrobeAlign ( Sahlin, 2021b ) is a new short-read aligner that combines syncmers and strobemers for extremely efficient alignment. Another example is the LCP (locally consistent parsing) technique ( Hach et al , 2012 ; Sahinalp and Vishkin, 1996 ), which selects varying length substrings instead of k-mers in a locally consistent manner (i.e.…”
Section: Discussionmentioning
confidence: 99%
“…It has been shown that strobemers allow for much higher conservation (called match-coverage in Sahlin, 2021a ) than k-mers. StrobeAlign ( Sahlin, 2021b ) is a new short-read aligner that combines syncmers and strobemers for extremely efficient alignment. Another example is the LCP (locally consistent parsing) technique ( Hach et al , 2012 ; Sahinalp and Vishkin, 1996 ), which selects varying length substrings instead of k-mers in a locally consistent manner (i.e.…”
Section: Discussionmentioning
confidence: 99%
“…Strobemers are constructed by linking together a set of smaller k-mers and can be constructed with several different methods to link the k-mers (minstrobes, randstrobes, hybridstrobes), yielding different properties. It was shown that Strobemers could offer higher sensitivity and specificity over k-mers, and they have been used for short-read mapping [38], long-read overlap detection [18], and transcriptomic long-read normalization [33].…”
Section: Other Seed Constructsmentioning
confidence: 99%
“…The definition of E-hits was given in [38] and is a measure of how repetitive the seeds in a query sequence are, on average, in a reference dataset. More specifically, the E-hits computes the expected number of hits that seeds constructed from a query sequence obtained uniformly at random from the reference will have.…”
Section: E-hits Of Seedsmentioning
confidence: 99%
See 1 more Smart Citation
“…As the number and depth of high-throughput sequencing experiments grows, efficient methods to map, store, and search DNA sequences have become critical in their analysis. Sequence sketching is a fundamental building block of many of the basic sequence analysis tasks, such as assembly [20,4], alignment [22,19,11], and binning [2,1,6]. The common principle in all sketching techniques is the selection of a k-mer representative from a long DNA sequence for indexing sequences in data structures or algorithms.…”
Section: Introductionmentioning
confidence: 99%