2008
DOI: 10.1145/1227161.1402296
|View full text |Cite
|
Sign up to set email alerts
|

Better external memory suffix array construction

Abstract: Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. However, so far it has looked prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
74
0
4

Year Published

2008
2008
2010
2010

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 80 publications
(78 citation statements)
references
References 33 publications
0
74
0
4
Order By: Relevance
“…In the first line of work, a plethora of published material exists dealing with the externalization of the suffix tree: [49,74,86,16,49,12,13,18,21,56,84,86] and the suffix array [24,25]. Most of these works suffer from various problems such as nonscalability, nonavailability of suffix links (that are necessary for the implementation of various operations) and nontolerance to data skew, and a few are the works that manage to face effectively these problems; from these works, we will present briefly the approach in [74].…”
Section: String Data Structures In Memory Hierarchiesmentioning
confidence: 99%
“…In the first line of work, a plethora of published material exists dealing with the externalization of the suffix tree: [49,74,86,16,49,12,13,18,21,56,84,86] and the suffix array [24,25]. Most of these works suffer from various problems such as nonscalability, nonavailability of suffix links (that are necessary for the implementation of various operations) and nontolerance to data skew, and a few are the works that manage to face effectively these problems; from these works, we will present briefly the approach in [74].…”
Section: String Data Structures In Memory Hierarchiesmentioning
confidence: 99%
“…Many external memory algorithms, implemented using this layer, can save factor of 2-3 in I/Os. For example, the algorithms for external memory suffix array construction implemented with this module [15] require only 1/3 of I/Os which must be performed by implementations that use conventional data structures and algorithms (either from Stxxl STL-user layer, or LEDA-SM, or TPIE). The win is due to an efficient interface, that couples the input and the output of the algorithm-components (scans, sorts, etc.).…”
Section: Stxxl Designmentioning
confidence: 99%
“…The question "When a pipelined execution of the computations in a data flow graph G is possible in an I/O-efficient way?" is analyzed in [15].…”
Section: Streaming Layermentioning
confidence: 99%
“…Unfortunately all the experiments reported so far have been performed with data sets at most a few gigabytes in size [24,7,14,5,17,16], telling that the construction algorithms have trouble scaling up for massive data sets.…”
Section: Introductionmentioning
confidence: 99%
“…Other alternatives have been to use secondary memory suffix array construction algorithms [4,5], dynamic indexes [3,19,20,11,25], or algorithms for constructing the compressed index directly [15,22,13]. While these algorithms are often memory efficient, they are also slow.…”
Section: Introductionmentioning
confidence: 99%