Motivation: Next-generation sequencing tools have enabled producing of huge amount of genomic information at low cost. Unfortunately, presence of sequencing errors in such data affects quality of downstream analyzes. Accuracy of them can be improved by performing error correction. Because of huge amount of such data correction algorithms have to: be fast, memory-frugal, and provide high accuracy of error detection and elimination for variously-sized organisms. Results: We introduce a new algorithm for genomic data correction, capable of processing eucaryotic 300 Mbp-genome-size, high error-rated data using less than 4 GB of RAM in less than 40 minutes on 16-core CPU. The algorithm allows to correct sequencing data at better or comparable level than competitors. This was achieved by using very robust KMC 2 k-mer counter, new method of erroneous regions correction based on both k-mer counts and FASTQ quality indicators as well as careful optimization. Availability: Program is freely available at http://sun.aei.posl.pl/REFRESH.
Summary: Kmer-db is a new tool for estimating evolutionary relationship on the basis of k-mers extracted from genomes or sequencing reads. Thanks to an efficient data structure and parallel implementation, our software estimates distances between 40,715 pathogens in less than 4 minutes (on a modern workstation), 44 times faster than Mash, its main competitor. Availability and Implementation: https:/
In this paper we introduce RADULS2, the fastest parallel sorter based on radix algorithm. It is optimized to process huge amounts of data making use of modern multicore CPUs. The main novelties include: extremely optimized algorithm for handling tiny arrays (up to about a hundred of records) that could appear even billions times as subproblems to handle and improved processing of larger subarrays with better use of non-temporal memory stores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.