2010
DOI: 10.1007/978-3-642-12683-3_20
|View full text |Cite
|
Sign up to set email alerts
|

Compressing Genomic Sequence Fragments Using SlimGene

Abstract: Abstract. With the advent of next generation sequencing technologies, the cost of sequencing whole genomes is poised to go below $1000 per human individual in a few years. As more and more genomes are sequenced, analysis methods are undergoing rapid development, making it tempting to store sequencing data for long periods of time so that the data can be re-analyzed with the latest techniques. The challenging open research problems, huge influx of data, and rapidly improving analysis techniques have created the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 10 publications
0
25
0
Order By: Relevance
“…Following a similar approach, [68] propose SLIMGENE, a lossless or lossy reference-based compression scheme focusing on how to find encodings of integers in order to minimize storage. In their work, they employ Huffman and arithmetic encoding.…”
Section: Referential Algorithmsmentioning
confidence: 99%
“…Following a similar approach, [68] propose SLIMGENE, a lossless or lossy reference-based compression scheme focusing on how to find encodings of integers in order to minimize storage. In their work, they employ Huffman and arithmetic encoding.…”
Section: Referential Algorithmsmentioning
confidence: 99%
“…To reduce the identifi cation of false variant calls, the following quality control fi lters were applied: A ) at least 75-fold sequence coverage per sample pool of fi ve probands (at least ‫ف‬ 15-fold per proband DNA); B ) a minor allele frequency >0.05 in sample pools; C ) detection in pooled high quality sequence data: major allele BaCON score >10, minor allele BaCON score >6, and BaCON score ratio <15:1 (the BaCON score is generated by Illumina and represents the discrepancy of variants. A score of р 6 indicates variants below the threshold of detection) ( 24 ); and D ) found only in LHDL or HHDL sample pools ( Fig. 2 ).…”
Section: Identifi Cation Of Novel Sequence Changesmentioning
confidence: 99%
“…2), 1,000 human genomes contain less than twice the unique information of one genome. Thus, although individual genomes are not very compressible 12,13 , collections of related genomes are extremely compressible [14][15][16][17] .…”
Section: Sublinear Analysis and Compressed Datamentioning
confidence: 99%
“…As more divergent genomes are added to a database, Many algorithms exist for the compression of genomic data sets purely to reduce the space required for storage and transmission [12][13][14][15]17,18 . Hsi-Yang Fritz et al 18 provide a particularly instructive discussion of the concerns involved.…”
Section: Challenges Of Compressive Algorithmsmentioning
confidence: 99%