Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Meng, Qingxi; Chandak, Shubham; Zhu, Yifan; Weissman, Tsachy

doi:10.1038/s41598-023-29267-8

Cited by 3 publications

(2 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This has prompted researchers to study long-read long compression techniques in greater depth. To reduce the size of raw FASTQ raw sequencing data, several compression techniques have been proposed, including Picopore, ENANO, RENANO, NanoSpring, FastqCLS, and CoLoRd ( Gigante, 2017 ; Dufort y Álvarez et al, 2020 , 2021 ; Meng et al, 2021 ; Kokot et al, 2022 ; Lee and Song, 2022 ). As an example, Picopore ( Gigante, 2017 ) provides a software suite that contains three compression methods: raw, lossless and deep lossless compression.…”

Section: Bioinformatics Of Nanopore Sequencingmentioning

confidence: 99%

“…In comparison to ENANO, RENANO ( Dufort y Álvarez et al, 2021 ) achieves significantly improved compression of read sequences but is limited to aligned data with a usable reference. In contrast, NanoSpring ( Meng et al, 2021 ) is a reference-free tool that relies on approximate assembly methods in order to achieve compression gains, but it requires more time and memory to accomplish compression gains. The FastqCLS ( Lee and Song, 2022 ) compression algorithm uses read reordering to compress long reads of long sequencing data without sacrificing information and performs well in terms of compression ratio.…”

Section: Bioinformatics Of Nanopore Sequencingmentioning

confidence: 99%

See 1 more Smart Citation

Portable nanopore-sequencing technology: Trends in development and applications

et al. 2023

View full text Add to dashboard Cite

Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.

show abstract

Section: Bioinformatics Of Nanopore Sequencingmentioning

confidence: 99%

Section: Bioinformatics Of Nanopore Sequencingmentioning

confidence: 99%

Portable nanopore-sequencing technology: Trends in development and applications

et al. 2023

View full text Add to dashboard Cite

show abstract

Lossless Compression of Nanopore Sequencing Raw Signals

Castelli,

González,

Torrado

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering

Sun,

Zheng,

Xie

et al. 2023

BMC Bioinformatics

View full text Add to dashboard Cite

Background Genomic sequencing reads compressors are essential for balancing high-throughput sequencing short reads generation speed, large-scale genomic data sharing, and infrastructure storage expenditure. However, most existing short reads compressors rarely utilize big-memory systems and duplicative information between diverse sequencing files to achieve a higher compression ratio for conserving reads data storage space. Results We employ compression ratio as the optimization objective and propose a large-scale genomic sequencing short reads data compression optimizer, named PMFFRC, through novelty memory modeling and redundant reads clustering technologies. By cascading PMFFRC, in 982 GB fastq format sequencing data, with 274 GB and 3.3 billion short reads, the state-of-the-art and reference-free compressors HARC, SPRING, Mstcom, and FastqCLS achieve 77.89%, 77.56%, 73.51%, and 29.36% average maximum compression ratio gains, respectively. PMFFRC saves 39.41%, 41.62%, 40.99%, and 20.19% of storage space sizes compared with the four unoptimized compressors. Conclusions PMFFRC rational usage big-memory of compression server, effectively saving the sequencing reads data storage space sizes, which relieves the basic storage facilities costs and community sharing transmitting overhead. Our work furnishes a novel solution for improving sequencing reads compression and saving storage space. The proposed PMFFRC algorithm is packaged in a same-name Linux toolkit, available un-limited at https://github.com/fahaihi/PMFFRC.

show abstract

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Cited by 3 publications

References 23 publications

Portable nanopore-sequencing technology: Trends in development and applications

Portable nanopore-sequencing technology: Trends in development and applications

Lossless Compression of Nanopore Sequencing Raw Signals

PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering

Contact Info

Product

Resources

About