RACER is freely available for non-commercial use at www.csd.uwo.ca/∼ilie/RACER/.
Next-generation sequencing technologies revolutionized the ways in which genetic information is obtained and have opened the door for many essential applications in biomedical sciences. Hundreds of gigabytes of data are being produced, and all applications are affected by the errors in the data. Many programs have been designed to correct these errors, most of them targeting the data produced by the dominant technology of Illumina. We present a thorough comparison of these programs. Both HiSeq and MiSeq types of Illumina data are analyzed, and correcting performance is evaluated as the gain in depth and breadth of coverage, as given by correct reads and k-mers. Time and memory requirements, scalability and parallelism are considered as well. Practical guidelines are provided for the effective use of these tools. We also evaluate the efficiency of the current state-of-the-art programs for correcting Illumina data and provide research directions for further improvement.
BackgroundDe novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed.ResultsWe present a new program, SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.ConclusionsSAGE benefits from innovations in almost every aspect of the assembly process: error correction of input reads, string-overlap graph construction, read copy counts estimation, overlap graph analysis and reduction, contig extraction, and scaffolding. We hope that these new ideas will help advance the current state-of-the-art in an essential area of research in genomics.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-302) contains supplementary material, which is available to authorized users.
Supplementary data are available at Bioinformatics online.
The Enhanced-Input Terminal project is directed at providing major new degrees of freedom for touch-type computer input, especially for on-line use of interactive computer systems. The terminal comprises an integrated system of hardware and software. While various choices are available for the actual input and output devices, the present prototype utilizes video display for both devices and a "cross-wire", touch-sensitive input panel. The EITS allows an "author" to define an essentially infinite set of symbols, and an infinite variety of "keyboard" formats. Chord inputs (i.e., simultaneous, multiple-"key" combinations) are also supported. Symbols can be defined in terms of dot matrices, generalized graphics, symbol strings, and functional operations. In spite of the complete generality afforded, the integrated system develops a standardtype of binary-bit-coded input stream, in which the individual symbols are uniquely and canonically represented, and which is amenable to all of the usual "text-file" operations, such as character manipulation, editing, transmission and re-display.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.