This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at .
BackgroundQSAR is an established and powerful method for cheap in silico assessment of physicochemical properties and biological activities of chemical compounds. However, QSAR models are rather complex mathematical constructs that cannot easily be interpreted. Medicinal chemists would benefit from practical guidance regarding which molecules to synthesize.Another possible approach is analysis of pairs of very similar molecules, so-called matched molecular pairs (MMPs). Such an approach allows identification of molecular transformations that affect particular activities (e.g. toxicity). In contrast to QSAR, chemical interpretation of these transformations is straightforward. Furthermore, such transformations can give medicinal chemists useful hints for the hit-to-lead optimization process.ResultsThe current study suggests a combination of QSAR and MMP approaches by finding MMP transformations based on QSAR predictions for large chemical datasets. The study shows that such an approach, referred to as prediction-driven MMP analysis, is a useful tool for medicinal chemists, allowing identification of large numbers of “interesting” transformations that can be used to drive the molecular optimization process. All the methodological developments have been implemented as software products available online as part of OCHEM (http://ochem.eu/).ConclusionsThe prediction-driven MMPs methodology was exemplified by two use cases: modelling of aquatic toxicity and CYP3A4 inhibition. This approach helped us to interpret QSAR models and allowed identification of a number of “significant” molecular transformations that affect the desired properties. This can facilitate drug design as a part of molecular optimization process.Graphical AbstractMolecular matched pairs and transformation graphs facilitate interpretable molecular optimisation process.
The ToxCast EPA challenge was managed by TopCoder in Spring 2014. The goal of the challenge was to develop a model to predict the lowest effect level (LEL) concentration based on in vitro measurements and calculated in silico descriptors. This article summarizes the computational steps used to develop the Rank-I model, which calculated the lowest prediction error for the secret test data set of the challenge. The model was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM), and it is freely available at . Surprisingly, this model does not use any in vitro measurements. The logic of the decision steps used to develop the model and the reason to skip inclusion of in vitro measurements is described. We also show that inclusion of in vitro assays would not improve the accuracy of the model.
The use of long-term animal studies for human and environmental toxicity estimation is more discouraged than ever before. Alternative models for toxicity prediction, including QSAR studies, are gaining more ground. A recent approach is to combine in vitro chemical profiling and in silico chemical descriptors with the knowledge about toxicity pathways to derive a unique signature for toxicity endpoints. In this study we investigate the ToxCast™ Phase I data regarding their ability to predict long-term animal toxicity. We investigated thousands of models constructed in an effort to predict 61 toxicity endpoints using multiple descriptor packages and hundreds of in vitro assays. We investigated the use of in vitro assays and biochemical pathways on model performance. We identified 10 toxicity endpoints where biologically derived descriptors from in vitro assays or pathway perturbations improved the model prediction ability. In vivo toxicity endpoints proved generally challenging to model. Few models were possible to readily model with a balanced accuracy (BA) above 0.7. We also constructed in silico models to predict the outcome of 144 in vitro assays. This showed better statistical metrics with 79 out of 144 assays having median balanced accuracy above 0.7. This suggests that the in vitro datasets have a better modelability than in vivo animal toxicities for the given datasets. Moreover, we published an online platform (http://iprior.ochem.eu) that automates large-scale model building and analysis.
Fully homomorphic encryption (FHE) is an encryption scheme which enables computation on encrypted data without revealing the underlying data. While there have been many advances in the field of FHE, developing programs using FHE still requires expertise in cryptography. In this white paper, we present a fully homomorphic encryption transpiler that allows developers to convert high-level code (e.g., C++) that works on unencrypted data into high-level code that operates on encrypted data. Thus, our transpiler makes transformations possible on encrypted data.Our transpiler builds on Google's open-source XLS SDK [1] and uses an off-the-shelf FHE library, TFHE [2], to perform low-level FHE operations. The transpiler design is modular, which means the underlying FHE library as well as the high-level input and output languages can vary. This modularity will help accelerate FHE research by providing an easy way to compare arbitrary programs in different FHE schemes side-by-side. We hope this lays the groundwork for eventual easy adoption of FHE by software developers. As a proof-of-concept, we are releasing an experimental transpiler [3] as opensource software.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.