The prerequisite of therapeutic drug design and discovery is to identify novel molecules and developing lead candidates with desired biophysical and biochemical properties. Deep generative models have demonstrated their ability to find such molecules by exploring a huge chemical space efficiently. An effective way to generate new molecules with desired target properties is by constraining the critical fucntional groups or the core scaffolds in the generation process. To this end, we developed a domain aware generative framework called 3D-Scaffold that takes 3D coordinates of the desired scaffold as an input and generates 3D coordinates of novel therapeutic candidates as an output while always preserving the desired scaffolds in generated structures. We demonstrated that our framework generates predominantly valid, unique, novel, and experimentally synthesizable molecules that have drug-like properties similar to the molecules in the training set. Using domain specific data sets, we generate covalent and noncovalent antiviral inhibitors targeting viral proteins. To measure the success of our framework in generating therapeutic candidates, generated structures were subjected to high throughput virtual screening via docking simulations, which shows favorable interaction against SARS-CoV-2 main protease (Mpro) and nonstructural protein endoribonuclease (NSP15) targets. Most importantly, our deep learning model performs well with relatively small 3D structural training data and quickly learns to generalize to new scaffolds, highlighting its potential application to other domains for generating target specific candidates.
The prerequisite of therapeutic drug design is to identify novel molecules with desired biophysical and biochemical properties. Deep generative models have demonstrated their ability to find such molecules by exploring a huge chemical space efficiently. An effective way to obtain molecules with desired target properties is the preservation of critical scaffolds in the generation process. To this end, we propose a domain aware generative framework called 3D-Scaffold that takes 3D coordinates of the desired scaffold as an input and generates 3D coordinates of novel therapeutic candidates as an output while always preserving the desired scaffolds in generated structures. We show that our framework generates predominantly valid, unique, novel, and experimentally synthesizable molecules that have drug-like properties similar to the molecules in the training set. Using domain specific datasets, we generate covalent and non-covalent antiviral inhibitors. To measure the success of our framework in generating therapeutic candidates, generated structures were subjected to high throughput virtual screening via docking simulations, which shows favorable interaction against SARS-CoV-2 main protease and non-structural protein endoribonuclease (NSP15) targets. Most importantly, our model performs well with relatively small volumes of training data and generalizes to new scaffolds, making it applicable to other domains.
Protein–ligand interactions (PLIs) are essential for biochemical functionality and their identification is crucial for estimating biophysical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures: $$\hbox {GNN}_{\mathrm{F}}$$ GNN F is the base implementation that employs distinct featurization to enhance domain-awareness, while $$\hbox {GNN}_{\mathrm{P}}$$ GNN P is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and protein’s 3D structure with 0.979 test accuracy for $$\hbox {GNN}_{\mathrm{F}}$$ GNN F and 0.958 for $$\hbox {GNN}_{\mathrm{P}}$$ GNN P for predicting activity of a protein–ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and $$\hbox {pIC}_{\mathrm{50}}$$ pIC 50 crucial for compound’s potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on $$\hbox {pIC}_{\mathrm{50}}$$ pIC 50 with $$\hbox {GNN}_{\mathrm{F}}$$ GNN F and $$\hbox {GNN}_{\mathrm{P}}$$ GNN P , respectively, outperforming similar 2D sequence based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of $$\hbox {GNN}_{\mathrm{P}}$$ GNN P on SARS-Cov-2 protein targets by screening a large compound library and comparing the prediction with the experimentally measured data.
Protein-ligand interactions (PLIs) are fundamental to biochemical research and their identification is crucial for estimating biophysical and biochemical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive.A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depend on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures: GNN F is the base implementation that employs distinct featurization to enhance domain-awareness, while GNN P is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. Comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and protein's 3D structure with 0.979 test accuracy for GNN F and 0.958 for GNN P for predicting activity of a protein-ligand complex. These models are further adapted for
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.