2019
DOI: 10.1146/annurev-biodatasci-072018-021229
|View full text |Cite
|
Sign up to set email alerts
|

Genomic Data Compression

Abstract: Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store ge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 52 publications
(43 citation statements)
references
References 103 publications
0
43
0
Order By: Relevance
“…The amount of data resulting from high-throughput sequencing poses a challenge for its immediate and long-term storage. Possible solutions are to discard non-important data, when possible, and/or data compression [ 18 ]. The choice of the compressor always comes with a trade-off between compression capacity and/or speed.…”
Section: Methodsmentioning
confidence: 99%
“…The amount of data resulting from high-throughput sequencing poses a challenge for its immediate and long-term storage. Possible solutions are to discard non-important data, when possible, and/or data compression [ 18 ]. The choice of the compressor always comes with a trade-off between compression capacity and/or speed.…”
Section: Methodsmentioning
confidence: 99%
“…DSK [18] uses an HDF5-based encoding, KMC3 [17] combines a dense storage of prefixes with a sparse storage of suffixes, and Squeakr [20] uses a counting quotient filter [21]. The compression of read data, on the other hand, stored in either unaligned or aligned formats, has received a lot of attention [22][23][24]. In the scenario where the k-mer set to be compressed was originally generated from FASTA files by a k-mer counter, an alternate to k-mer compression is to compress the original FASTA file and use a k-mer counter as part of the decompression to extract the k-mers on the fly.…”
Section: Related Workmentioning
confidence: 99%
“…The above specific characteristics led to the development of the field of the study and construction of specific genomic data compressors [ 28 , 29 ]. This field, now 27 years old, started with Biocompress [ 30 ].…”
Section: Introductionmentioning
confidence: 99%