2020
DOI: 10.1101/2020.04.19.049262
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy

Abstract: Motivation: Nanopore sequencing provides a real-time and portable solution to genomic sequencing, with long reads enabling better assembly and structural variant discovery than second generation technologies. The nanopore sequencing process generates huge amounts of data in the form of raw current data, which must be compressed to enable efficient storage and transfer. Since the raw current data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely im… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…Besides improvements in accuracy, improvements in data handling will become central as the raw data are multifold larger than those obtained from short read data. Methods that improve compression of fast5 files and more space-efficient alternative file types for storing raw nanopore data are currently being developed [78,79] xi , and graphics processing unit acceleration is used routinely [80]. However, further improvements to reduce file sizes, standardizing file formats, and compute and memory-efficient algorithms will greatly reduce the barrier for larger-scale applications and adaptation.…”
Section: Discussionmentioning
confidence: 99%
“…Besides improvements in accuracy, improvements in data handling will become central as the raw data are multifold larger than those obtained from short read data. Methods that improve compression of fast5 files and more space-efficient alternative file types for storing raw nanopore data are currently being developed [78,79] xi , and graphics processing unit acceleration is used routinely [80]. However, further improvements to reduce file sizes, standardizing file formats, and compute and memory-efficient algorithms will greatly reduce the barrier for larger-scale applications and adaptation.…”
Section: Discussionmentioning
confidence: 99%
“…Given the high sequencing depth, there is much redundancy to be exploited in the reads, and several specialized compressors like SPRING (Chandak et al, 2019) and PgRC (Kowalski and Grabowski, 2019) have been developed for this data. The typical approach used by these compressors is to efficiently build an with the advent of deep learning based basecallers which achieve median error rate close to 5% or better (Chandak et al, 2020), suggesting that a similar approximate assembly approach with some adaptations can be applied to nanopore sequencing reads.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, nanopore reads are much longer (often over hundreds of thousands of bases long), and have a much higher error rate, including substitution, insertion, and deletion errors from the basecalling process that converts the raw current signal to the read sequences (Wick et al ., 2019). However, the error rate has fallen dramatically in the recent years with the advent of deep learning based basecallers which achieve median error rate close to 5% or better (Chandak et al ., 2020), suggesting that a similar approximate assembly approach with some adaptations can be applied to nanopore sequencing reads.…”
Section: Introductionmentioning
confidence: 99%