Proceedings. IEEE Computer Society Bioinformatics Conference
DOI: 10.1109/csb.2002.1039352
|View full text |Cite
|
Sign up to set email alerts
|

DNA sequence compression using the Burrows-Wheeler Transform

Abstract: --We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
34
0

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(34 citation statements)
references
References 30 publications
0
34
0
Order By: Relevance
“…We can easily define parallel arrays to also point to the position of the longest factor to permit easy access to these factors. Direct applications of our introduced data structures may include pattern substitution, detecting duplication [6], LZ decomposition in text compression [41], studying periodicity in strings [32,39], biological sequence compression [3,21], and analysis of repetition structures in DNA sequences [22,2]. Specifically, our pLF data structure may be used to identify how to best substitute a pattern or even determine if duplication is "hidden" by reversal or with parameterization.…”
Section: Discussionmentioning
confidence: 99%
“…We can easily define parallel arrays to also point to the position of the longest factor to permit easy access to these factors. Direct applications of our introduced data structures may include pattern substitution, detecting duplication [6], LZ decomposition in text compression [41], studying periodicity in strings [32,39], biological sequence compression [3,21], and analysis of repetition structures in DNA sequences [22,2]. Specifically, our pLF data structure may be used to identify how to best substitute a pattern or even determine if duplication is "hidden" by reversal or with parameterization.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, some researches in lossless compression methods commonly aim to optimize existing compression method for specific data type [7][8][9][10][11][12][13][14][15][16][17][18] or to improve the existing compression method by transforming data to other form before compression process or by combining several compression method [19][20][21][22]. One of novel researches in compression method is Asymmetric Numerical System (ANS) [23][24].…”
Section: New Lossless Compression Methods Using Crlcm (Hendra Mesra)mentioning
confidence: 99%
“…The selection of K is based on frequency distribution of the difference (d) of iteration number (i) and its predictions (p) in (7). As example, the frequency distribution of the difference value on Lena image is shown on Figure 7.…”
mentioning
confidence: 99%
“…DNA sequence can be very huge. For example, the human genome contains about 3.1647 billion DNA base pairs [1]. Searching patterns in the DNA sequences databases is usually the first and crucial step in DNA related research, such as DNA sequence alignment.…”
Section: Introductionmentioning
confidence: 99%
“…The major reason is the fact that these methods never consider certain special characteristics of biological sequences. On the contrary, the algorithms, which consider the different regularities or repetition structures that are inherent in DNA sequence, make great success [1]. BIOCOMPRESS [2], GENCOMPRESS [3] and BWT-base [4] compress are the outstanding algorithms.…”
Section: Introductionmentioning
confidence: 99%