BackgroundDetermining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time frames and high expenses of drug development. With current informatics technology and machine learning algorithms, it is now possible to computationally discover therapeutic hypotheses by predicting clinically promising drug targets based on the evidence associating drug targets with disease indications. We have collected this evidence from Open Targets and additional databases that covers 17 sources of evidence for target-indication association and represented the data as a tensor of 21,437 × 2211 × 17.ResultsAs a proof-of-concept, we identified examples of successes and failures of target-indication pairs in clinical trials across 875 targets and 574 disease indications to build a gold-standard data set of 6140 known clinical outcomes. We designed and executed three benchmarking strategies to examine the performance of multiple machine learning models: Logistic Regression, LASSO, Random Forest, Tensor Factorization and Gradient Boosting Machine. With 10-fold cross-validation, tensor factorization achieved AUROC = 0.82 ± 0.02 and AUPRC = 0.71 ± 0.03. Across multiple validation schemes, this was comparable or better than other methods.ConclusionIn this work, we benchmarked a machine learning technique called tensor factorization for the problem of predicting clinical outcomes of therapeutic hypotheses. Results have shown that this method can achieve equal or better prediction performance compared with a variety of baseline models. We demonstrate one application of the method to predict outcomes of trials on novel indications of approved drug targets. This work can be expanded to targets and indications that have never been clinically tested and proposing novel target-indication hypotheses. Our proposed biologically-motivated cross-validation schemes provide insight into the robustness of the prediction performance. This has significant implications for all future methods that try to address this seminal problem in drug discovery.Electronic supplementary materialThe online version of this article (10.1186/s12859-019-2664-1) contains supplementary material, which is available to authorized users.
Determining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time frames and high expenses of drug development. We identified examples of successes and failures of target-indication pairs in clinical trials across 875 targets and 574 disease indications to build a goldstandard data set of 6,140 known clinical outcomes. We used information from Open Targets and others databases that covered 17 different sources of evidence for target-indication association and represented the data as a matrix of 21,437×2,211×17 with over two million non-null values. We designed and executed three benchmarking strategies to examine the performance of multiple machine learning models: Logistic Regression, Elasticnet, Random Forest, Tensor Factorization and Gradient Boosting Machine. With ten-fold cross validation, tensor factorization achieved AUROC=0.82±0.02 and AUPRC=0.71±0.03. Across multiple validation schemes, this was comparable or better than other methods. Tensor factorization is a general form of matrix factorization that has been successfully exploited in recommendation systems that suggest items to users based on their existing preference on a small number of items. Our application, using Bayesian probabilistic modelling, extends the capacity of matrix factorization to model multiple relationships between and among targets and indications. We use the model to show that our predicted probabilities of success correlate with clinical phases, and within clinical phase we can predict which trials are most likely to succeed.
Background Clear cell renal cell carcinoma (ccRCC) is the most common and highly heterogeneous subtype of renal cell carcinoma. Dysregulated basal cell adhesion molecule (BCAM) gene is associated with poor prognosis in various cancers. However, the dysregulated functions and related multi-omics features of BCAM in ccRCC stay unclear. Results BCAM expression was aberrantly downregulated in ccRCC and correlated with adverse pathological parameters and poor prognosis. Low mRNA expression of BCAM was remarkably associated with its CpG methylation levels and BAP1 mutation status. Patients with lower-expressed BCAM concomitant with BAP1 mutation had a worse prognosis. Using RNA-seq data from The cancer genome atlas, we found that compared to the BCAM-high expression subgroup, ccRCC patients in the BCAM-low expression subgroup had significantly higher levels of immune infiltration, higher immune checkpoint expression levels and lower TIDE (tumor immune dysfunction and exclusion) score, indicating potential better response to immunotherapy. Data from the Clinical Proteomic Tumor Analysis Consortium further validated the association between low BCAM expression and CD8 + inflamed phenotype at protein level. Meanwhile, our results suggested that the angiogenesis-related pathways were enriched in the BCAM-high expression subgroup. More importantly, according to the data from the GDSC database, we revealed that the BCAM-high expression subgroup should be more sensitive to anti-angiogenetic therapies, including sorafenib, pazopanib and axitinib. Conclusions These results suggest that BCAM could serve as a biomarker distinguishing different tumor microenvironment phenotypes, predicting prognosis and helping therapeutic decision-making for patients with ccRCC.
Aiming to the problem that is very difficult to establish the mechanism model of quality for the process of tobacco leaves redrying, this paper proposes a quality prediction model based on principal component analysis (PCA) and improved back propagation (BP)neural network for tobacco leaves redrying process. Firstly, 12 input variables are confirmed by analyzing the factors on quality of tobacco leaves redrying process. Second, the methods of PCA is used to eliminate the correlation of original input layer data, in which 12 input variables are transformed into 6 uncorrelated indicators. Then, the quality prediction model based on improved BP neural network is established. Finally, a simulation experiment is conducted and the average prediction error is as low as 1.03%, the absolute error for forecasting is fluctuated in the range of 0.16% - 2.49%. The result indicates that the model is simpler and has higher stability for prediction, which can completely meet the actual requirements of the tobacco leaves redrying process.
In order to find out the key input parameters, which aroused the output quality out of control during the manufacturing process, an integrated quality diagnosis algorithm for input parameters was proposed. The diagnosis method extends the traditional quality control and diagnosis method that only for the output quality of manufacturing process. It can detect the input parameters of the manufacturing process and provide sensitivities of input parameter for adjustment. Firstly, through the establishment of residual error T2 control chart, the quality failure situation can be detected. Then, the BN-MTY method was applied to explain the reason of quality failure in T2 control chart and the root output quality characteristic that aroused the process quality anomaly was located. The integrated method of neural network and sensitivity analysis was used to get the weight and threshold value of never cell in the forecasting network. They were applied to calculate the sensitivities of input parameters to the root output quality. Sensitivities represent the importance of the input parameters to the output quality failure. This integrated quality diagnosis method can both diagnose the output quality characteristics and the input parameters
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.