Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

Coley, Connor W.; Barzilay, Regina; Green, William; Jaakkola, Tommi S.; Jensen, Klavs F.

doi:10.1021/acs.jcim.6b00601

Cited by 411 publications

(377 citation statements)

References 53 publications

Supporting

Mentioning

374

Contrasting

Unclassified

Order By: Relevance

“…In particular, they have achieved a remarkable performance in classifying documents in citation networks 39 , modeling and predicting chemical properties of molecules 40,41,67 and protein interface prediction with applications in drug discovery and design 42 . Here, we propose our model based on the work of Kipf & Welling 39 .…”

Section: Methodsmentioning

confidence: 99%

“…More recently, geometric deep learning methods 37 and more specifically Graph Convolutional Networks (GCNs) 38,39 have offered a way to overcome these limitations by generalizing convolutional operations on more natural graph-like molecular representations. Graph Convolutional Networks have shown tremendous success in various problems ranging from learning useful molecular fingerprints 40 , to predicting biochemical activity of drugs 41 , to protein interface prediction 42 .…”

mentioning

confidence: 99%

See 1 more Smart Citation

Structure-Based Protein Function Prediction using Graph Convolutional Networks

Gligorijević

Renfrew

Kościółek

et al. 2019

Preprint

116

199

View full text Add to dashboard Cite

Recent massive increases in the number of sequences available in public databases challenges current experimental approaches to determining protein function. These methods are limited by both the large scale of these sequences databases and the diversity of protein functions. We present a deep learning Graph Convolutional Network (GCN) trained on sequence and structural data and evaluate it on~40k proteins with known structures and functions from the Protein Data Bank (PDB). Our GCN predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and competing methods. Feature extraction via a language model removes the need for constructing multiple sequence alignments or feature engineering. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 30% sequence identity to the training set. Using class activation mapping, we can automatically identify structural regions at the residue-level that lead to each function prediction for every protein confidently predicted, advancing site-specific function prediction. De-noising inherent in the trained model allows an only minor drop in performance when structure predictions are used, including multiple de novo protocols. We use our method to annotate all proteins in the PDB, making several new confident function predictions spanning both fold and function trees.

show abstract

Section: Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

Structure-Based Protein Function Prediction using Graph Convolutional Networks

Gligorijević

Renfrew

Kościółek

et al. 2019

Preprint

116

199

View full text Add to dashboard Cite

show abstract

“…They update node-level representations based on the neighbourhood, and compute a graph-level representations (molecule representations, in our case) based on all nodes representations. We will call the "minimal" formulation of GNN, the intuitive and simple encoding of the graph in which the AGGREGATE (l) graph function is defined by the sum of all nodes representations, at each layer [28,29,[32][33][34][35]:…”

Section: Molecular Graph Encodermentioning

confidence: 99%

Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity

Playe

Stoven

2020

J Cheminform

View full text Add to dashboard Cite

Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein-ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expertbased chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.

show abstract

“…We focused on well-established machine learning models instead of more recent deep learning models, such as graph-based neural networks. 36,[64][65][66][67] This is because our main goal was to investigate the virtual screening principles for choosing the best model for a specific task (PriA-SSB AS) in a practical setting instead of broadly benchmarking virtual screening algorithms. In addition, a recent benchmark showed that conventional methods outperformed graph-based methods on most biophysics datasets.…”

Section: Discussionmentioning

confidence: 99%

Practical Model Selection for Prospective Virtual Screening

Liu

Alnammi

Ericksen

et al. 2018

Preprint

View full text Add to dashboard Cite

Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the dataset and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein-protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions 1 . CC-BY 4.0 International license It is made available under a was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint (which . http://dx.doi.org/10.1101/337956 doi: bioRxiv preprint first posted online Jun. 4, 2018; from our selected random forest recover 40 of the 62 active compounds from a library of 25,279 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well in public datasets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest.

show abstract

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

Cited by 411 publications

References 53 publications

Structure-Based Protein Function Prediction using Graph Convolutional Networks

Structure-Based Protein Function Prediction using Graph Convolutional Networks

Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity

Practical Model Selection for Prospective Virtual Screening

Contact Info

Product

Resources

About