General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation

Guo, Jiali; Sun, Ming; Zhao, Xueyan; Shi, Chaojie; Su, Haoming; Guo, Yanzhi; Pu, Xuemei

doi:10.1021/acs.jcim.2c01538

Cited by 11 publications

(5 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Message-passing neural networks (MPNNs), as a variant of GNNs, − are capable of taking both nodes and edges features as inputs. By incorporating edges features into the model, it is possible to enhance the understanding and representation of molecular graphs, resulting in significantly improved accuracy in predicting molecular chemical properties. − Several MPNN-based network architectures have been proposed for predicting other compound attributes, such as cocrystal density, infrared spectra, bond order, and bond energy …”

Section: Introductionmentioning

confidence: 99%

“…By incorporating edges features into the model, it is possible to enhance the understanding and representation of molecular graphs, resulting in significantly improved accuracy in predicting molecular chemical properties. 32−35 Several MPNN-based network architectures have been proposed for predicting other compound attributes, such as cocrystal density, 36 infrared spectra, 37 bond order, and bond energy. 38 In this work, a model for predicting the pK a values of C−H acid based on the MPNN framework was constructed.…”

Section: ■ Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Explainable Graph Neural Networks with Data Augmentation for Predicting pK_a of C–H Acids

An,

Liu,

Cai

et al. 2023

J. Chem. Inf. Model.

View full text Add to dashboard Cite

The pK a of C–H acids is an important parameter in the fields of organic synthesis, drug discovery, and materials science. However, the prediction of pK a is still a great challenge due to the limit of experimental data and the lack of chemical insight. Here, a new model for predicting the pK a values of C–H acids is proposed on the basis of graph neural networks (GNNs) and data augmentation. A message passing unit (MPU) was used to extract the topological and target-related information from the molecular graph data, and a readout layer was utilized to retrieve the information on the ionization site C atom. The retrieved information then was adopted to predict pK a by a fully connected network. Furthermore, to increase the diversity of the training data, a knowledge-infused data augmentation technique was established by replacing the H atoms in a molecule with substituents exhibiting different electronic effects. The MPU was pretrained with the augmented data. The efficacy of data augmentation was confirmed by visualizing the distribution of compounds with different substituents and by classifying compounds. The explainability of the model was studied by examining the change of pK a values when a specific atom was masked. This explainability was used to identify the key substituents for pK a. The model was evaluated on two data sets from the iBonD database. Dataset1 includes the experimental pK a values of C–H acids measured in DMSO, while dataset2 comprises the pK a values measured in water. The results show that the knowledge-infused data augmentation technique greatly improves the predictive accuracy of the model, especially when the number of samples is small.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: ■ Introductionmentioning

confidence: 99%

Explainable Graph Neural Networks with Data Augmentation for Predicting pK_a of C–H Acids

An,

Liu,

Cai

et al. 2023

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…28 Guo et al developed a general MPNNbased framework coupling with global attention to predict cocrystal density and further identified significant atoms to realize the interpretability of the model. 29 To accelerate the drug repurposing and discovery research, Wang et al presented a deep fusion anatomical therapeutic chemical (ATC) prediction model DeepATC, where GCN was used to extract drug topological information. 30 Pham et al developed a mechanism-driven neural network-based architecture DeepCE by incorporating GNN and multihead attention mechanism to support virtual screening of phenotype compounds.…”

Section: Introductionmentioning

confidence: 99%

“…Ryu et al employed GAT to accurately predict molecular polarity, solubility, energy, and additionally detected essential features directly relating to target properties . Guo et al developed a general MPNN-based framework coupling with global attention to predict cocrystal density and further identified significant atoms to realize the interpretability of the model . To accelerate the drug repurposing and discovery research, Wang et al presented a deep fusion anatomical therapeutic chemical (ATC) prediction model DeepATC, where GCN was used to extract drug topological information .…”

Section: Introductionmentioning

confidence: 99%

Attention Mechanism-Based Graph Neural Network Model for Effective Activity Prediction of SARS-CoV-2 Main Protease Inhibitors: Application to Drug Repurposing as Potential COVID-19 Therapy

Wu,

Li,

et al. 2023

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

Compared to de novo drug discovery, drug repurposing provides a time-efficient way to treat coronavirus disease 19 (COVID-19) that is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 main protease (Mpro) has been proved to be an attractive drug target due to its pivotal involvement in viral replication and transcription. Here, we present a graph neural network-based deep-learning (DL) strategy to prioritize the existing drugs for their potential therapeutic effects against SARS-CoV-2 Mpro. Mpro inhibitors were represented as molecular graphs ready for graph attention network (GAT) and graph isomorphism network (GIN) modeling for predicting the inhibitory activities. The result shows that the GAT model outperforms the GIN and other competitive models and yields satisfactory predictions for unseen Mpro inhibitors, confirming its robustness and generalization. The attention mechanism of GAT enables to capture the dominant substructures and thus to realize the interpretability of the model. Finally, we applied the optimal GAT model in conjunction with molecular docking simulations to screen the Drug Repurposing Hub (DRH) database. As a result, 18 drug hits with best consensus prediction scores and binding affinity values were identified as the potential therapeutics against COVID-19. Both the extensive literature searching and evaluations on adsorption, distribution, metabolism, excretion, and toxicity (ADMET) illustrate the premium drug-likeness and pharmacokinetic properties of the drug candidates. Overall, our work not only provides an effective GAT-based DL prediction tool for inhibitory activity of SARS-CoV-2 Mpro inhibitors but also provides theoretical guidelines for drug discovery in the COVID-19 treatment.

show abstract

“…For these methods, it was estimated a variable accuracy in the range of 30–80% depending on the API . To overcome the poor accuracy of the property-based methods, a combination of different tools was also proposed, showing an improvement in the coformer selection of specific systems. ,− Recently, data-driven ML approaches have become increasingly popular due to the rapidity of calculation and promising predictive accuracy. − Several algorithms were evaluated, such as support vector machine (SVM), random forest (RF), neural networks (NN), and partial least squares-discriminant analysis (PLS-DA), and also, a wide variety of molecular representations were considered, including molecular descriptors, fingerprint vectors, and molecular graphs . To mention a few studies, Fornari et al proposed using QSAR descriptors and the PLS-DA model to discriminate between the formation of cocrystals and physical mixtures .…”

Section: Introductionmentioning

confidence: 99%

Speeding Up the Cocrystallization Process: Machine Learning-Combined Methods for the Prediction of Multicomponent Systems

Birolo,

Bravetti,

Alladio

et al. 2023

Crystal Growth & Design

View full text Add to dashboard Cite

Pharmaceutical cocrystals are crystalline materials composed of at least two molecules, i.e., an active pharmaceutical ingredient (API) and a coformer, assembled by noncovalent forces. Cocrystallization is successfully applied to improve the physicochemical properties of APIs, such as solubility, dissolution profile, pharmacokinetics, and stability. However, choosing the ideal coformer is a challenging task in terms of time, efforts, and laboratory resources. Several computational tools and machine learning (ML) models have been proposed to mitigate this problem. However, the challenge of achieving a robust and generalizable predictive method is still open. In this study, we propose a new approach to quickly predict the formation of cocrystals, employing partial least squares-discriminant analysis, random forest, and neural networks. The models were based on the data sets of 13 structurally different APIs with both positive and negative cocrystallization outcomes. At the same time, the features were specially selected from a variety of molecular descriptors to explain the phenomenon of the cocrystallization. All of the proposed ML models showed a cross-validation accuracy higher than 83%. Furthermore, this approach was successfully applied to drive the cocrystallization experimental tests of 2-phenylpropionic acid, showcasing the high potential of the ML models in practice.

show abstract

General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation

Cited by 11 publications

References 76 publications

Explainable Graph Neural Networks with Data Augmentation for Predicting pK_a of C–H Acids

Explainable Graph Neural Networks with Data Augmentation for Predicting pK_a of C–H Acids

Attention Mechanism-Based Graph Neural Network Model for Effective Activity Prediction of SARS-CoV-2 Main Protease Inhibitors: Application to Drug Repurposing as Potential COVID-19 Therapy

Speeding Up the Cocrystallization Process: Machine Learning-Combined Methods for the Prediction of Multicomponent Systems

Contact Info

Product

Resources

About

General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation

Cited by 11 publications

References 76 publications

Explainable Graph Neural Networks with Data Augmentation for Predicting pKa of C–H Acids

Explainable Graph Neural Networks with Data Augmentation for Predicting pKa of C–H Acids

Attention Mechanism-Based Graph Neural Network Model for Effective Activity Prediction of SARS-CoV-2 Main Protease Inhibitors: Application to Drug Repurposing as Potential COVID-19 Therapy

Speeding Up the Cocrystallization Process: Machine Learning-Combined Methods for the Prediction of Multicomponent Systems

Contact Info

Product

Resources

About

Explainable Graph Neural Networks with Data Augmentation for Predicting pK_a of C–H Acids

Explainable Graph Neural Networks with Data Augmentation for Predicting pK_a of C–H Acids