NanoSpring: reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Meng, Qingxi; Chandak, Shubham; Zhu, Yu; Weissman, Tsachy

doi:10.1101/2021.06.09.447198

Cited by 2 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nanopore sequencing, however, is a much more recent technology and few specific data compressors suitable for nanopore data are available, developed by our group [6,7] and others [8,9]. Moreover, the lossy compression of quality scores for nanopore data has only been explored in [9], where the impact of quality score information loss is assessed for some downstream analyses.…”

Section: Introductionmentioning

confidence: 99%

Nanopore quality score resolution can be reduced with little effect on downstream analysis

Rivara-Espasandín

Balestrazzi

Álvarez

et al. 2022

Preprint

View full text Add to dashboard Cite

We investigate the effect of quality score information loss on downstream analysis from nanopore sequencing FASTQ files. We polished denovo assemblies for a mock microbial community and a human genome, and we called variants on a human genome. We repeated these experiments using various pipelines, under various coverage level scenarios, and various quality score quantizers. In all cases we found that the quantization of quality scores cause little difference on (or even improves) the results obtained with the original (non-quantized) data. This suggests that the precision that is currently used for nanopore quality scores is unnecessarily high, and motivates the use of lossy compression algorithms for this kind of data. Moreover, we show that even a non-specialized compressor, like gzip, yields large storage space savings after quantization of quality scores.

show abstract

Section: Introductionmentioning

confidence: 99%

Nanopore quality score resolution can be reduced with little effect on downstream analysis

Rivara-Espasandín

Balestrazzi

Álvarez

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Nanopore sequencing, however, is a much more recent technology and few specific data compressors suitable for nanopore data are available, developed by our group ( Dufort y Álvarez et al , 2020 , 2021 ) and others ( Kokot et al , 2022 ; Meng et al , 2021 ). Moreover, the lossy compression of quality scores for nanopore data has only been explored in ( Kokot et al , 2022 ), where the impact of quality score information loss is assessed for some downstream analyses.…”

Section: Introductionmentioning

confidence: 99%

Nanopore quality score resolution can be reduced with little effect on downstream analysis

Rivara-Espasandín

Balestrazzi

Álvarez

et al. 2022

Bioinformatics Advances

View full text Add to dashboard Cite

Motivation The use of high precision for representing quality scores in nanopore sequencing data makes these scores hard to compress and, thus, responsible for most of the information stored in losslessly compressed FASTQ files. This motivates the investigation of the effect of quality score information loss on downstream analysis from nanopore sequencing FASTQ files. Results We polished de novo assemblies for a mock microbial community and a human genome, and we called variants on a human genome. We repeated these experiments using various pipelines, under various coverage level scenarios, and various quality score quantizers. In all cases we found that the quantization of quality scores causes little difference (or even sometimes improves) on the results obtained with the original (non-quantized) data. This suggests that the precision that is currently used for nanopore quality scores may be unnecessarily high, and motivates the use of lossy compression algorithms for this kind of data. Moreover, we show that even a non-specialized compressor, like gzip, yields large storage space savings after quantization of quality scores. Availability Quantizers freely available for download at: https://github.com/mrivarauy/QS-Quantizer Supplementary information Available at https://github.com/mrivarauy/QS-Quantizer

show abstract

NanoSpring: reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Cited by 2 publications

References 29 publications

Nanopore quality score resolution can be reduced with little effect on downstream analysis

Nanopore quality score resolution can be reduced with little effect on downstream analysis

Nanopore quality score resolution can be reduced with little effect on downstream analysis

Contact Info

Product

Resources

About