2016
DOI: 10.3390/info7040056
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Data Compression Methods for Biological Sequences

Abstract: Abstract:The ever increasing growth of the production of high-throughput sequencing data poses a serious challenge to the storage, processing and transmission of these data. As frequently stated, it is a data deluge. Compression is essential to address this challenge-it reduces storage space and processing costs, along with speeding up data transmission. In this paper, we provide a comprehensive survey of existing compression approaches, that are specialized for biological data, including protein and DNA seque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
49
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
3
1

Relationship

1
9

Authors

Journals

citations
Cited by 83 publications
(51 citation statements)
references
References 123 publications
(169 reference statements)
0
49
0
Order By: Relevance
“…DSK [18] uses an HDF5-based encoding, KMC3 [17] combines a dense storage of prefixes with a sparse storage of suffixes, and Squeakr [20] uses a counting quotient filter [21]. The compression of read data, on the other hand, stored in either unaligned or aligned formats, has received a lot of attention [22][23][24]. In the scenario where the k-mer set to be compressed was originally generated from FASTA files by a k-mer counter, an alternate to k-mer compression is to compress the original FASTA file and use a k-mer counter as part of the decompression to extract the k-mers on the fly.…”
Section: Related Workmentioning
confidence: 99%
“…DSK [18] uses an HDF5-based encoding, KMC3 [17] combines a dense storage of prefixes with a sparse storage of suffixes, and Squeakr [20] uses a counting quotient filter [21]. The compression of read data, on the other hand, stored in either unaligned or aligned formats, has received a lot of attention [22][23][24]. In the scenario where the k-mer set to be compressed was originally generated from FASTA files by a k-mer counter, an alternate to k-mer compression is to compress the original FASTA file and use a k-mer counter as part of the decompression to extract the k-mers on the fly.…”
Section: Related Workmentioning
confidence: 99%
“…It is possible that new concepts and terminology will be needed to map existing taxonomic categories into the genomic reality of the 21st century. Similarly, compressed genome data storage techniques, including just mapping differences relative to reference genomes of one or more species, may be leveraged to reduce data storage, transfer, and computation bottlenecks (Hosseini et al, 2016).…”
Section: The Future Of Genomic Signaturesmentioning
confidence: 99%
“…For example, new models can be tested under this framework, namely with extended alphabets [8]. In general, any data compressor able to output local estimations can be used in the pipeline as an alternative [9].…”
Section: Validationmentioning
confidence: 99%