2007 Data Compression Conference (DCC'07) 2007
DOI: 10.1109/dcc.2007.7
|View full text |Cite
|
Sign up to set email alerts
|

A Simple Statistical Algorithm for Biological Sequence Compression

Abstract: This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
18
0
1

Year Published

2009
2009
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 58 publications
(19 citation statements)
references
References 24 publications
0
18
0
1
Order By: Relevance
“…Based on partial matches of subsets of the input, this model predicts the next symbols in the sequence. High compression rates are possible if the model always indicates high probabilities for the next symbol, i.e., if the prediction is reliable [12,13].…”
Section: Basic Techniquesmentioning
confidence: 99%
“…Based on partial matches of subsets of the input, this model predicts the next symbols in the sequence. High compression rates are possible if the model always indicates high probabilities for the next symbol, i.e., if the prediction is reliable [12,13].…”
Section: Basic Techniquesmentioning
confidence: 99%
“…In 1993 the first specialized DNA compressor was proposed (Grumbach and Tahi, 1993). Since then, numerous DNA compressors were developed (e.g., Cao et al, 2007, Li et al, 2013, Benoit et al, 2015, Al-Okaily et al, 2017. In our experience only two compressors pass the practicality threshold: DELIMINATE (Mohammed et al, 2012) and MFCompress (Pinho and Pratas, 2014).…”
Section: Introductionmentioning
confidence: 87%
“…Lempel-Ziv-based compression. Statistical algorithms [13,20] derive a predictive model from (a subset of) the input, based on partial matches. If the model always indicates high probabilities for the next symbol, then high compression rates are possible.…”
Section: Related Workmentioning
confidence: 99%