2007
DOI: 10.1007/978-3-540-72845-0_17
|View full text |Cite
|
Sign up to set email alerts
|

Engineering a Compressed Suffix Tree Implementation

Abstract: Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Σ = {A, C, G, T } can be stored in n log |Σ| = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0
1

Year Published

2009
2009
2018
2018

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(15 citation statements)
references
References 31 publications
0
14
0
1
Order By: Relevance
“…Aárvore de sufixo comprimida (do inglês compressed suffix trees ou CST)é considerada uma estrutura de dados de indexação própria, com tamanho reduzido, que representa umaárvore de sufixo, através de vetores, simulando as principais funcionalidades convencionais daárvore de sufixo [Välimäki et al 2009].…”
Section: Sequências: Estruturas De Dados Problemas E Algoritmos Relaunclassified
“…Aárvore de sufixo comprimida (do inglês compressed suffix trees ou CST)é considerada uma estrutura de dados de indexação própria, com tamanho reduzido, que representa umaárvore de sufixo, através de vetores, simulando as principais funcionalidades convencionais daárvore de sufixo [Välimäki et al 2009].…”
Section: Sequências: Estruturas De Dados Problemas E Algoritmos Relaunclassified
“…al. [22] designed an algorithm to address the memory challenge by using the Compressed Suffix Tree (CST), there was an improvement on the memory requirements however the memory requirements remained impractically high for it to be adopted and usable. For instance, for the whole human genome, the indexing takes around 4 days, the complete tree takes around 8.5 GB with a maximum amount of memory consumption of 24 GB.…”
Section: Related Workmentioning
confidence: 99%
“…An array representing all the suffixes of the text in their lexicographic order is called as the suffix array (SA). Improving the space efficiency of suffix array data structure led to a new class of data structure abstract data structures like compressed suffix array [1][2][3][4][5] and then compressed suffix trees [6][7][8][9] with many variations in their implementations. The research in this class of data structure has been oriented towards finding an optimal space/time trade-off in their implementation either during the construction or in their use as an indexing data structure.…”
Section: Introductionmentioning
confidence: 99%