2009
DOI: 10.1007/978-3-642-03784-9_7
|View full text |Cite
|
Sign up to set email alerts
|

Compressed Suffix Arrays for Massive Data

Abstract: Abstract. We present a fast space-efficient algorithm for constructing compressed suffix arrays (CSA). The algorithm requires O(n log n) time in the worst case, and only O(n) bits of extra space in addition to the CSA. As the basic step, we describe an algorithm for merging two CSAs. We show that the construction algorithm can be parallelized in a symmetric multiprocessor system, and discuss the possibility of a distributed implementation. We also describe a parallel implementation of the algorithm, capable of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
3
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 39 publications
(46 citation statements)
references
References 25 publications
0
46
0
Order By: Relevance
“…Once the unique subset U of R has been calculated, we do not need to recompute the FM-index of U from scratch. The BWT of U can be derived from the FM-index of R by marking the positions in B R that correspond to reads that were discarded and exporting the unmarked positions as B U (Sirén 2009). …”
Section: Read Filteringmentioning
confidence: 99%
“…Once the unique subset U of R has been calculated, we do not need to recompute the FM-index of U from scratch. The BWT of U can be derived from the FM-index of R by marking the positions in B R that correspond to reads that were discarded and exporting the unmarked positions as B U (Sirén 2009). …”
Section: Read Filteringmentioning
confidence: 99%
“…Linear time algorithms exist for the task, but their practical bottleneck is the peak memory consumption. Although there exist general time‐efficient and space‐efficient construction algorithms, it turned out that our special case of text collection admits a tailored incremental BWT construction algorithm (see the references and experimental comparison therein for previous work on BWT construction): The text collection is split into several smaller collections, and a temporary index is built for each of them separately. The temporary indexes are then merged and finally, converted into a static FM‐index.…”
Section: Text Representationmentioning
confidence: 99%
“…Linear time algorithms exist for the task, but their practical bottleneck is the peak memory consumption. Although there exist general time-efficient and space-efficient construction algorithms, it turned out that our special case of text collection admits a tailored incremental BWT construction algorithm [40] (see the references and experimental comparison therein for previous work on BWT construction): The text collection is split into several smaller collections, and a temporary index is built for each of them separately. The temporary indexes are then merged and finally, converted into a static FM-index.…”
Section: Construction and Text Extractionmentioning
confidence: 99%