DNA Compression Challenge Revisited: A Dynamic Programming Approach

Behzadi, Behshad; Fessant, Fabrice Le

doi:10.1007/11496656_17

Cited by 75 publications

(51 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Due to space limitations, we present here the most efficient algorithms, including BioCompress-2 (BioC) [12], GenCompress (GenC) [6], DNACompress (DNAC) [7], DNAPack (DNAP) [4], CDNA [15] and GeMNL [14]. Comparison with other DNA compressors can be found on the website: ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/ The results of CDNA are reported for only 9 sequences in precision of two decimal places.…”

Section: Sequencementioning

confidence: 99%

“…The CTW+LZ algorithm developed by Matsumoto et al [16] encodes significantly long repeats by the substitution method, and encodes short repeats and non repeat areas by context tree weighting [23]. At the cost of time complexity, DNAPack Behzadi and Fessant [4] employs a dynamic programming approach to find repeats. Non-repeat regions are encoded by the best choice from an order 2 Markov model, context tree weighting, and naive 2 bits per symbol methods.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Simple Statistical Algorithm for Biological Sequence Compression

Cao

Dix

Allison

et al. 2007

2007 Data Compression Conference (DCC'07)

View full text Add to dashboard Cite

This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time.

show abstract

Section: Sequencementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Simple Statistical Algorithm for Biological Sequence Compression

Cao

Dix

Allison

et al. 2007

2007 Data Compression Conference (DCC'07)

View full text Add to dashboard Cite

show abstract

“…For these sequences which contain small scale of repetitive fragments, this type algorithm can not obtain better performance. On the other side, the reference algorithms [5,6] attract the interesting of researchers, which reason to the high compression rate of this type algorithm. In [5], the compression rate can be 80 when the human genome sequence are encoded by using the corresponding algorithm.…”

Section: Introductionmentioning

confidence: 99%

“…However, for these genome sequences with low repetitive fragments , such as the microbial genome sequence, this type algorithm is not suitable. Actually, for these genome sequences, the context based entropy coding scheme [6][7][8][9][10] is more powerful than other types of algorithms.…”

Section: Introductionmentioning

confidence: 99%

Genome Sequence compression algorithm based on the Distributed source coding

Shao¹

2016

Proceedings of the 2016 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer

View full text Add to dashboard Cite

Abstract. The genome sequence compression algorithm based on the distributed source coding technology purely is proposed in this paper. In order to enhance the compression efficiency, the genome sequence is mapped into two binary sources and then they are transmitted into two bilevel images. After initialization, the distributed source coding based on LDPC is constructed for compressing these two sequences. To compress the side information, the optimized context weighting is suggested. The experiments results indicate that the coding efficiency is better than results from any other compression algorithms for microbial genome sequence compression.

show abstract

“…To help manage large genomic databases, compression algorithms that capture and efficiently encode this repeated information are employed. Compression algorithms specific to DNA sequences have been around for some time [6,7,8,9,10,11,12]. How-ever, most existing algorithms are unsuitable for compressing large datasets of multiple sequences.…”

Section: Introductionmentioning

confidence: 99%

Survey of Compression of DNA Sequence

SinghRai¹,

Bharti²

2013

IJCA

View full text Add to dashboard Cite

Compression of large collections of data can lead to improvements in retrieval times by offsetting the CPU decompression costs with the cost of seeking and retrieving data from disk. In this paper, the author has study the different compression method which can compress the large DNA sequence. In this paper, authors have explored the DNA compression method that is COMRAD, which is used to compare with the dictionary based compression method i.e. LZ77, LZ78, LZW and general purpose compression method RAY. In this, authors have analyzed which one algorithm is better to compress the large collection of the DNA Sequence. Compression table and the line graph show that which compression algorithm has a better compression ratio and the compression size. It also shows that which one has better compression and decompression time.

show abstract

DNA Compression Challenge Revisited: A Dynamic Programming Approach

Cited by 75 publications

References 8 publications

A Simple Statistical Algorithm for Biological Sequence Compression

A Simple Statistical Algorithm for Biological Sequence Compression

Genome Sequence compression algorithm based on the Distributed source coding

Survey of Compression of DNA Sequence

Contact Info

Product

Resources

About