2021
DOI: 10.1093/bioinformatics/btab437
|View full text |Cite
|
Sign up to set email alerts
|

RENANO: a REference-based compressor for NANOpore FASTQ files

Abstract: Motivation Nanopore sequencing technologies are rapidly gaining popularity, in part, due to the massive amounts of genomic data they produce in short periods of time (up to 8.5 TB of data in < 72 hours). To reduce the costs of transmission and storage, efficient compression methods for this type of data are needed. Results We introduce RENANO, a reference-based lossless data compressor specifically tailored to FASTQ fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 4 publications
0
10
0
Order By: Relevance
“…This has prompted researchers to study long-read long compression techniques in greater depth. To reduce the size of raw FASTQ raw sequencing data, several compression techniques have been proposed, including Picopore, ENANO, RENANO, NanoSpring, FastqCLS, and CoLoRd ( Gigante, 2017 ; Dufort y Álvarez et al, 2020 , 2021 ; Meng et al, 2021 ; Kokot et al, 2022 ; Lee and Song, 2022 ). As an example, Picopore ( Gigante, 2017 ) provides a software suite that contains three compression methods: raw, lossless and deep lossless compression.…”
Section: Bioinformatics Of Nanopore Sequencingmentioning
confidence: 99%
“…This has prompted researchers to study long-read long compression techniques in greater depth. To reduce the size of raw FASTQ raw sequencing data, several compression techniques have been proposed, including Picopore, ENANO, RENANO, NanoSpring, FastqCLS, and CoLoRd ( Gigante, 2017 ; Dufort y Álvarez et al, 2020 , 2021 ; Meng et al, 2021 ; Kokot et al, 2022 ; Lee and Song, 2022 ). As an example, Picopore ( Gigante, 2017 ) provides a software suite that contains three compression methods: raw, lossless and deep lossless compression.…”
Section: Bioinformatics Of Nanopore Sequencingmentioning
confidence: 99%
“…To evaluate the improvement in compressibility obtained by quantizing the quality scores of nanopore FASTQ files, we used sample HG003 (the same data set used for variant calling evaluation), which consists of three FASTQ files that add up to approximately 520 GB. We compressed the original data set and quantized versions of this data set using the general purpose compressor gzip 7 . For each evaluated quantizer we calculated the compression ratio, defined as the quotient between the size of the compressed data set and the size of the original data set (smaller ratios correspond to better compression performance).…”
Section: Compressibilitymentioning
confidence: 99%
“…Nanopore sequencing, however, is a much more recent technology and few specific data compressors suitable for nanopore data are available, developed by our group [6,7] and others [8,9]. Moreover, the lossy compression of quality scores for nanopore data has only been explored in [9], where the impact of quality score information loss is assessed for some downstream analyses.…”
Section: Introductionmentioning
confidence: 99%
“…Quality scores have also been compressed lossily without an impact on the downstream performance for short-read technologies 8 , 9 and more recently for nanopore itself 10 , 11 . RENANO 12 is a recent reference-based compressor that achieves significantly better compression for read sequences, but is limited to aligned data with a reference available. Most recently, CoLoRd 10 included both a reference-free and reference-based compressor using overlap graph based approach, achieving significant improvement over ENANO in the reference-free mode at the cost of higher resource usage.…”
Section: Introductionmentioning
confidence: 99%