Motivation Information on protein-protein interactions is collected in numerous primary databases with their own curation process. Several meta-databases aggregate primary databases to provide more exhaustive datasets. In addition to exhaustivity, aggregation contributes to reliability by providing an overview of the various studies and detection methods supporting an interaction. However, interactions listed in different primary databases are partly redundant because some publications reporting protein-protein interactions have been curated by multiple primary databases. Mere aggregation can thus introduce a bias if these redundancies are not identified and eliminated. To overcome this bias, meta-databases rely on the Molecular Interaction ontology that describes interaction detection methods, but they do not fully take advantage of the ontology’s rich semantics, which leads to systematically overestimating interaction reproducibility. Results We propose a precise definition of explicit and implicit redundancy, and show that both can be easily detected using Semantic Web technologies. We apply this process to a dataset from the APID meta-database and show that while explicit redundancies were detected by the APID aggregation process, about 15% of APID entries are implicitly redundant and should not be taken into account when presenting confidence-related metrics. More than 90% of implicit redundancies result from the aggregation of distinct primary databases, while the remaining occurs between entries of a single database. Finally, we build a” reproducible interactome” with interactions that have been reproduced by multiple methods or publications. The size of the reproducible interactome is drastically impacted by removing redundancies for both yeast (-59%) and human (-56%), and we show that this is largely due to implicit redundancies. Availability Software, data and results are available at https://gitlab.com/nnet56/reproducible-interactome, https://reproducible-interactome.genouest.org/, Zenodo (doi : 10.5281/zenodo.5595037) and NDEx (doi : 10.18119/N94302, doi : 10.18119/N97S4D Supplementary information Supplementary data are available at Bioinformatics online.
Motivation Molecular complexes play a major role in the regulation of biological pathways. The Biological Pathway Exchange format (BioPAX) facilitates the integration of data sources describing interactions some of which involving complexes. The BioPAX specification explicitly prevents complexes to have any component that is another complex (unless this component is a black-box complex whose composition is unknown). However, we observed that the well-curated Reactome pathway database contains such recursive complexes of complexes. We propose reproductible and semantically-rich SPARQL queries for identifying and fixing invalid complexes in BioPAX databases, and evaluate the consequences of fixing these non-conformities in the Reactome database. Results For the Homo sapiens version of Reactome, we identify 5,833 recursively defined complexes out of the 14,987 complexes (39%). This situation is not specific to the human dataset, as all tested species of Reactome exhibit between 30% (Plasmodium falciparum) and 40% (Sus scrofa, Bos taurus, Canis familiaris, Gallus gallus) of recursive complexes. As an additional consequence, the procedure also allows the detection of complex redundancies. Overall, this method improves the conformity and the automated analysis of the graph by repairing the topology of the complexes in the graph. This will allow to apply further reasoning methods on better consistent data. Availability We provide a jupyter notebook detailing the analysis https://github.com/cjuigne/non_conformities_detection_biopax. Supplementary information Supplementary data are available at Bioinformatics online.
Background: Feed efficiency is a research priority to support a sustainable meat production. It is recognized as a complex trait that integrates multiple biological pathways orchestrated in and by various tissues. This study aims to determine networks between biological entities to explain inter-individual variation of feed efficiency in growing pigs. Results: The feed conversion ratio (FCR), a measure of feed efficiency, and its two component traits, average daily gain and average daily feed intake, were obtained from 47 growing pigs from a divergent selection for residual feed intake and fed high-starch or high-fat high-fiber diets during 58 days. Datasets of transcriptomics (60 k porcine microarray) in the whole blood and metabolomics (1H-NMR analysis and target gas chromatography) in plasma were available for all pigs at the end of the trial. A weighted gene co-expression network was built from the transcriptomics dataset, resulting in 33 modules of co-expressed molecular probes. The eigengenes of eight of these modules were significantly (P ≤ 0.05) or tended to be (0.05 < P ≤ 0.10) correlated to FCR. Great homogeneity in the enriched biological pathways was observed in these modules, suggesting co-expressed and co-regulated constitutive genes. They were mainly enriched in genes participating to immune and defense-related processes. They were also generally associated with growth rate and percentage of lean mass. In the network, only one module composed of genes participating to the response to substances, was significantly associated with daily feed intake and body adiposity. The profiles in circulating metabolites and fatty acids of plasma were summarized by weighted linear combinations using a dimensionality reduction method. Close association was notably found between a module composed of co-expressed genes participating to T cell receptor signaling and cell development process in the whole blood and related to FCR, and the circulating concentrations of omega-3 fatty acids in plasma. Conclusion: These systemic approaches have highlighted networks of entities driving key biological processes involved in the phenotypic difference in feed efficiency between animals. Connecting transcriptomics and metabolic levels together had some additional benefits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.