2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2014
DOI: 10.1109/bibm.2014.6999305
|View full text |Cite
|
Sign up to set email alerts
|

Spaced seed data structures

Abstract: This past decade, genome sciences have benefitted from rapid advances in DNA sequencing technologies, and development of efficient algorithms for processing short nucleotide sequences played a key role in enabling their uptake in the field. In particular, reassembly of human genomes (de novo or reference-guided) from short DNA sequence reads had a substantial impact on health research. De novo assembly of a genome is essential in the absence of a reference genome sequence of a species. It is also gaining tract… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 37 publications
0
3
0
Order By: Relevance
“…Recent efforts to minimize the genome assembly resource footprint have led to the implementation of several memory‐efficient assemblers (Simpson and Durbin, ; Conway and Bromage, ; Chikhi and Rizk, ; Ye et al ., ), but usually at the expense of time and accuracy. We have been preoccupied by the scale problem for some time (Simpson et al ., ) and have recently outlined and presented the theory behind assembly by spaced seeds, a re‐design of the traditional k‐ mer that, even in current data structure implementations, has potential for an over two‐fold speed‐up and a four‐fold reduction in memory without compromising on assembly correctness (Birol et al ., ).…”
Section: Discussionmentioning
confidence: 97%
“…Recent efforts to minimize the genome assembly resource footprint have led to the implementation of several memory‐efficient assemblers (Simpson and Durbin, ; Conway and Bromage, ; Chikhi and Rizk, ; Ye et al ., ), but usually at the expense of time and accuracy. We have been preoccupied by the scale problem for some time (Simpson et al ., ) and have recently outlined and presented the theory behind assembly by spaced seeds, a re‐design of the traditional k‐ mer that, even in current data structure implementations, has potential for an over two‐fold speed‐up and a four‐fold reduction in memory without compromising on assembly correctness (Birol et al ., ).…”
Section: Discussionmentioning
confidence: 97%
“…Several extensions of the spaced seed model have then been proposed on the two aforementioned problems: vector seeds [ 5 ], one gapped q -grams [ 6 ] or indel seeds [ 7 , 8 ], neighbor seeds [ 9 , 10 ], transition seeds [ 11 – 15 ], multiple seeds [ 16 19 ], adaptive seeds [ 20 ] and related work on the associated indexes [ 21 – 26 ], just to mention a few.…”
Section: Introductionmentioning
confidence: 99%
“…A single base difference, due to real biological variation or a sequencing error, affects all k-mers crossing that position thus impeding direct analyses by identity. Also, given the strong interdependence of local sequence, contiguous sections capture less information about genome structure and are thus more affected by sequence repetition (Chaisson et al, 2009; Birol et al, 2015).…”
Section: Introductionmentioning
confidence: 99%