Engineering a compressed suffix tree implementation

Välimäki, Niko; Mäkinen, Veli; Gerlach, Wolfgang; Dixit, Kashyap

doi:10.1145/1498698.1594228

Cited by 21 publications

(17 citation statements)

References 36 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A natural alternative to this approach is to use specialized compressed data structures which allow specific operations to be performed on compressed data. This study of compressed or succinct data structures is advancing rapidly; for example, both Grossi et al and Välimäki et al show how compressed suffix trees help in practice for real-world problems by reducing memory use [26,49]. While we believe this is a promising approach, to our knowledge, no compressed data structure provides a suitable interface to implement a generic GroupBy-Aggregate operation.…”

Section: Compression For Memory Efficiencymentioning

confidence: 92%

“…DRAM is expensive and an expensive consumer of power [29]. Memory accesses are a common bottleneck for high-performance applications [49]. With the number of cores per socket growing faster than the memory Table 1: Amazon EC2 proportional resource costs (# resource units × per-hour unit resource cost); the per-hour unit resource costs are 1.51¢ (1 Elastic Compute Unit), 1.93¢ (1GB RAM) and 0.018¢ (1GB storage); analysis detailed in §5.3.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Memory-efficient groupby-aggregate using compressed buffer trees

Amur

Richter

Andersen

et al. 2013

Proceedings of the 4th Annual Symposium on Cloud Computing

View full text Add to dashboard Cite

Memory is rapidly becoming a precious resource in many data processing environments. This paper introduces a new data structure called a Compressed Buffer Tree (CBT). Using a combination of buffering, compression, and lazy aggregation, CBTs can improve the memory efficiency of the GroupBy-Aggregate abstraction which forms the basis of many data processing models like MapReduce and databases. We evaluate CBTs in the context of MapReduce aggregation, and show that CBTs can provide significant advantages over existing hashbased aggregation techniques: up to 2× less memory and 1.5× the throughput, at the cost of 2.5× CPU.

show abstract

Section: Compression For Memory Efficiencymentioning

confidence: 92%

Section: Introductionmentioning

confidence: 99%

Memory-efficient groupby-aggregate using compressed buffer trees

Amur

Richter

Andersen

et al. 2013

Proceedings of the 4th Annual Symposium on Cloud Computing

View full text Add to dashboard Cite

show abstract

“…We compare the following CST implementations: Välimäki et al's [20] implementation of Sadakane's compressed suffix tree [11] (CST-Sadakane); Russo's implementation of Russo et al's "fully-compressed" suffix tree [14] (FCST); and our best variants. These are called Our CST in the plots.…”

Section: Comparing the Cst Implementationsmentioning

confidence: 99%

“…The solution based on explicit topology was implemented by Välimäki et al [20]. As expected from theory, the structure is very fast, achieving a few tens of microseconds per operation, but uses significant space (about 25-35 bpc, close to a suffix array).…”

Section: Introductionmentioning

confidence: 99%

“…As predicted by theory once again, we achieve practical implementations that lie between the two previous extremes (too large or too slow) and offer attractive space/time tradeoffs. One variant shows to be superior to the original implementation of Sadakane's CST [20] in both space and time: It uses 13-16 bpc (i.e., half the space) and requires a few microseconds per operation (i.e., several times faster). A second variant works within 8-12 bpc and requires a few hundreds of microseconds per operation, that is, smaller than our first variant and still several times faster than Russo's implementation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Practical Compressed Suffix Trees

2013

View full text Add to dashboard Cite

Abstract:The suffix tree is an extremely important data structure in bioinformatics. Classical implementations require much space, which renders them useless to handle large sequence collections. Recent research has obtained various compressed representations for suffix trees, with widely different space-time tradeoffs. In this paper we show how the use of range min-max trees yields novel representations achieving practical space/time tradeoffs. In addition, we show how those trees can be modified to index highly repetitive collections, obtaining the first compressed suffix tree representation that effectively adapts to that scenario.

show abstract