Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Background The rapid increase in High-throughput sequencing of RNA (RNA-seq) has led to tremendous improvements in the detection and reconstruction of both expressed coding and non-coding RNA transcripts. Yet, the complete and accurate annotation of the complex transcriptional output of not only the human genome has remained elusive. One of the critical bottlenecks in this endeavor is the computational reconstruction of transcript structures, due to high noise levels, technological limits, and other biases in the raw data. Results We introduce several new and improved algorithms in a novel workflow for transcript assembly and quantification. We propose an extension of the common splice graph framework that combines aspects of overlap and bin graphs and makes it possible to efficiently use both multi-splice and paired-end information to the fullest extent. Phasing information of reads is used to further resolve loci. The decomposition of read coverage patterns is modeled as a minimum-cost flow problem to account for the unavoidable non-uniformities of RNA-seq data. Conclusion Its performance compares favorably with state of the art methods on both simulated and real-life datasets. Ryūtō calls 1−4 % more true transcripts, while calling 5−35 % less false predictions compared to the next best competitor. Electronic supplementary material The online version of this article (10.1186/s12859-019-2786-5) contains supplementary material, which is available to authorized users.
We describe a new approach to assemble genomes from a combination of low-coverage short and long reads. LazyBastard starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs, which are then reduced to a long-read overlap graph G. Edges are removed from G to obtain first a consistent orientation and then a DAG. Using heuristics based on properties of proper interval graphs, contigs are extracted as maximum weight paths. These are translated into genomic sequence only in the final step. A prototype implementation of LazyBastard, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort.
Motivation Accurate assembly of RNA-seq is a crucial step in many analytic tasks such as gene annotation or expression studies. Despite ongoing research, progress on traditional single sample assembly has brought no major breakthrough. Multi-sample RNA-Seq experiments provide more information than single sample datasets and thus constitute a promising area of research. Yet, this advantage is challenging to utilize due to the large amount of accumulating errors. Results We present an extension to Ryūtō enabling the reconstruction of consensus transcriptomes from multiple RNA-seq data sets, incorporating consensus calling at low level features. We report stable improvements already at 3 replicates. Ryūtō outperforms competing approaches, providing a better and user-adjustable sensitivity-precision trade-off. Ryūtō’s unique ability to utilize a (incomplete) reference for multi sample assemblies greatly increases precision. We demonstrate benefits for differential expression analysis. Conclusion Ryūtō consistently improves assembly on replicates of the same tissue independent of filter settings, even when mixing conditions or time series. Consensus voting in Ryūtō is especially effective at high precision assembly, while Ryūtō’s conventional mode can reach higher recall. Availability Ryūtō is available at https://github.com/studla/RYUTO Supplementary information Supplementary data are available at Bioinformatics online.
Background Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, “hybrid” methods that integrate short and long read data have been devised to address this need. Results is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of , entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. Conclusions is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. Availability The prototype is available at https://github.com/TGatter/LazyB.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.