2013
DOI: 10.1186/1471-2105-14-187
|View full text |Cite
|
Sign up to set email alerts
|

QualComp: a new lossy compressor for quality scores based on rate distortion theory

Abstract: BackgroundNext Generation Sequencing technologies have revolutionized many fields in biology by reducing the time and cost required for sequencing. As a result, large amounts of sequencing data are being generated. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. This data consists of both the nucleotide sequences and per-base quality scores that indicate the level of confidence in the readout of these sequences. Quality scores acco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
46
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(47 citation statements)
references
References 36 publications
1
46
0
Order By: Relevance
“…In addition to the usual general purpose compressors, we also compared our compressive framework to QualComp [27], which features a tuning parameter to specify the number of bits needed for quality scores per read. We chose to sweep the QualComp parameter, bits per read, to match RQS' compression level and accuracy.…”
Section: Resultsmentioning
confidence: 99%
“…In addition to the usual general purpose compressors, we also compared our compressive framework to QualComp [27], which features a tuning parameter to specify the number of bits needed for quality scores per read. We chose to sweep the QualComp parameter, bits per read, to match RQS' compression level and accuracy.…”
Section: Resultsmentioning
confidence: 99%
“…The attempt to achieve higher compression rates than those yielded by lossless approaches such as the algorithms employed by SAMtools [3] and other optimized implementations [2] is leading to the study of new lossy schemes for QVs, such as the ones recently appearing in literature [4] [5] [6] [7]. These works point out that in some cases lossy compression of QVs does not negatively affect the quality of analysis results, but seems to actually improve performance of certain analyses such as genotyping (identification of variants with respect to a reference genome) [8].…”
Section: Introductionmentioning
confidence: 99%
“…By taking advantage of biological structure, both parts of NGS reads can be better compressed. Unlike some other approaches to compressing quality scores in the literature, 4,26 Quartz 39 takes advantage of the fact that midsize l -mers can in many cases almost uniquely identify locations in the genome, bounding the likelihood that a quality score is informative and allowing for lossy compression of uninformative scores. Because Quartz’s lossy compression injects information from the distribution of l -mers in the target genome, it demonstrates not only improved compression over competing approaches, but slightly improves the accuracy of downstream variant-calling.…”
Section: State-of-the-art Approaches To Meet These Challengesmentioning
confidence: 99%