The integration of data and knowledge from heterogeneous sources can be a key success factor in drug design, drug repurposing and multi-target therapies. In this context, biological networks provide a useful instrument to highlight the relationships and to model the phenomena underlying therapeutic action in cancer. In our work, we applied network-based modeling within a novel bioinformatics pipeline to identify promising multi-target drugs. Given a certain tumor type/subtype, we derive a disease-specific Protein-Protein Interaction (PPI) network by combining different data-bases and knowledge repositories. Next, the application of suitable graph-based algorithms allows selecting a set of potentially interesting combinations of drug targets. A list of drug candidates is then extracted by applying a recent data fusion approach based on matrix tri-factorization. Available knowledge about selected drugs mechanisms of action is finally exploited to identify the most promising candidates for planning in vitro studies. We applied this approach to the case of Triple Negative Breast Cancer (TNBC), a subtype of breast cancer whose biology is poorly understood and that lacks of specific molecular targets. Our “in-silico” findings have been confirmed by a number of in vitro experiments, whose results demonstrated the ability of the method to select candidates for drug repurposing.
Objective Computing patients’ similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. Materials and Methods In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. Results In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. Discussion In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. Conclusion The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.
Supplementary data are available at bioinformatics online.
Background: Reverse engineering of transcriptional regulatory networks (TRN) from genomics data has always represented a computational challenge in System Biology. The major issue is modeling the complex crosstalk among transcription factors (TFs) and their target genes, with a method able to handle both the high number of interacting variables and the noise in the available heterogeneous experimental sources of information. Results: In this work, we propose a data fusion approach that exploits the integration of complementary omics-data as prior knowledge within a Bayesian framework, in order to learn and model large-scale transcriptional networks. We develop a hybrid structure-learning algorithm able to jointly combine TFs ChIP-Sequencing data and gene expression compendia to reconstruct TRNs in a genomewide perspective. Applying our method to high-throughput data, we verified its ability to deal with the complexity of a genomic TRN, providing a snapshot of the synergistic TFs regulatory activity. Given the noisy nature of data-driven prior knowledge, which potentially contains incorrect information, we also tested the method's robustness to false priors on a benchmark dataset, comparing the proposed approach to other regulatory network reconstruction algorithms. We demonstrated the effectiveness of our framework by evaluating structural commonalities of our learned genomic network with other existing networks inferred by different DNA binding information-based methods. Conclusions: This Bayesian omics-data fusion based methodology allows to gain a genome-wide picture of the transcriptional interplay, helping to unravel key hierarchical transcriptional interactions, which could be subsequently investigated, and it represents a promising learning approach suitable for multi-layered genomic data integration, given its robustness to noisy sources and its tailored framework for handling high dimensional data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.