BackgroundReactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.ResultsHere, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user’s sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.ConclusionThrough the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub (https://github.com/reactome/).
The program is available upon request from the authors, free for academic users. Additional information available at http://www.uv.es/genomica/UVCLUSTER.
Human blood metagenomics has revealed the presence of different types of viruses in apparently healthy subjects. By far, anelloviruses constitute the viral family that is more frequently found in human blood, although amplification biases and contaminations pose a major challenge in this field. To investigate this further, we subjected pooled plasma samples from 120 healthy donors in Spain to high-speed centrifugation, RNA and DNA extraction, random amplification, and massive parallel sequencing. Our results confirm the extensive presence of anelloviruses in such samples, which represented nearly 97% of the total viral sequence reads obtained. We assembled 114 different viral genomes belonging to this family, revealing remarkable diversity. Phylogenetic analysis of ORF1 suggested 28 potentially novel anellovirus species, 24 of which were validated by Sanger sequencing to discard artifacts. These findings underscore the importance of implementing more efficient purification procedures that enrich the viral fraction as an essential step in virome studies and question the suggested pathological role of anelloviruses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.