Nanopore sequencing has been widely used for the reconstruction of microbial genomes. Owing to higher error rates, errors on the genome are corrected via neural networks trained by Nanopore reads. However, the systematic errors usually remain uncorrected. This paper designs a model that is trained by homologous sequences for the correction of Nanopore systematic errors. The developed program, Homopolish, outperforms Medaka and HELEN in bacteria, viruses, fungi, and metagenomic datasets. When combined with Medaka/HELEN, the genome quality can exceed Q50 on R9.4 flow cells. We show that Nanopore-only sequencing can produce high-quality microbial genomes sufficient for downstream analysis.
Nanopore sequencing has been widely used for reconstruction of a variety of microbial genomes. Owing to the higher error rate, the assembled genome requires further error correction. Existing methods erase many of these errors via deep neural network trained from Nanopore reads. However, quite a few systematic errors are still left on the genome. This paper proposed a new model trained from homologous sequences extracted from closely-related genomes, which provides valuable features missed in Nanopore reads. The developed program (called Homopolish) outperforms the state-of-the-art Racon/Medaka and MarginPolish/HELEN pipelines in metagenomic and isolates of bacteria, viruses and fungi. When Homopolish is combined with Medaka or with HELEN, the genomes quality can exceed Q50 on R9.4 flowcells. The genome quality can be also improved on R10.3 flowcells (Q50-Q90). We proved that Nanopore-only sequencing can now produce high-quality genomes without the need of Illumina hybrid sequencing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.