2019
DOI: 10.3390/e21111074
|View full text |Cite
|
Sign up to set email alerts
|

A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

Abstract: The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 90 publications
0
7
0
1
Order By: Relevance
“… For DNA Sequence 5 (DS5), Jarvis uses the same configuration as in [ 64 ]; for DS4 and DS3 it uses Level 7. XM uses the default configuration.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“… For DNA Sequence 5 (DS5), Jarvis uses the same configuration as in [ 64 ]; for DS4 and DS3 it uses Level 7. XM uses the default configuration.…”
Section: Resultsmentioning
confidence: 99%
“…From all the previous algorithms, the most efficient according to compression ratio in the wide diversity of DNA sequences are XM [ 43 ], GeCo2 [ 3 ], and Jarvis [ 64 ]. These compressors apply statistical and algorithmic model mixtures combined with arithmetic encoding.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We tested all DNA sequence compressors that are available and functional in 2020: dnaX [ 14 ], XM [ 15 ], DELIMINATE [ 16 ], Pufferfish [ 17 ], DNA-COMPACT [ 18 ], MFCompress [ 19 ], UHT [ 20 ], GeCo [ 21 ], GeCo2 [ 22 ], JARVIS [ 23 ], NAF [ 24 ], and NUHT [ 25 ]. We also included the relatively compact among homology search database formats: BLAST [ 26 ] and 2bit—a database format of BLAT [ 27 ].…”
Section: Resultsmentioning
confidence: 99%
“…Поэтому интерес представляют методы со степенью сжатия, превышающей 75 %. Первые инструменты для сжатия ДНК-последовательностей разработаны в 1993-1994 годах [17,8] и продолжают появляться в наши дни (см., например, [19]). Наряду с алгоритмами сжатия индивидуальных ДНК-последовательностей разрабатываются и «вертикальные» алгоритмы, ориентированные на кодирование с использованием эталонных (референсных) последовательностей и фиксирующие только различия в целевом и эталонном текстах.…”
Section: Introductionunclassified