Predicting Binding from Screening Assays with Transformer Network Embeddings

Morris, Paul; Clair, Rachel St.; Hahn, William Edward; Barenholtz, Elan

doi:10.1021/acs.jcim.9b01212

Cited by 34 publications

(18 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We applied the pretrained end-to-end transformer to 83,000,000 SMILES collected from PubChem to obtain the structural characteristic embedding vectors of each drug using the end-to-end transformer deep neural network based on their frequency and sequential order of each SMILES character. An encoder layer with self-attention operation mapped the SMILES sequence into the latent space based on its relationship with other characters [ 25 ]. A decoder layer had similar structures to encoder layers, and the output of the final decoder layer was the same as the input sequence ( Figure 2(b) ).…”

Section: Methodsmentioning

confidence: 99%

Deep Learning-Assisted Repurposing of Plant Compounds for Treating Vascular Calcification: An In Silico Study with Experimental Validation

Chao

Tsai

Lee

et al. 2022

Oxidative Medicine and Cellular Longevity

View full text Add to dashboard Cite

Background. Vascular calcification (VC) constitutes subclinical vascular burden and increases cardiovascular mortality. Effective therapeutics for VC remains to be procured. We aimed to use a deep learning-based strategy to screen and uncover plant compounds that potentially can be repurposed for managing VC. Methods. We integrated drugome, interactome, and diseasome information from Comparative Toxicogenomic Database (CTD), DrugBank, PubChem, Gene Ontology (GO), and BioGrid to analyze drug-disease associations. A deep representation learning was done using a high-level description of the local network architecture and features of the entities, followed by learning the global embeddings of nodes derived from a heterogeneous network using the graph neural network architecture and a random forest classifier established for prediction. Predicted results were tested in an in vitro VC model for validity based on the probability scores. Results. We collected 6,790 compounds with available Simplified Molecular-Input Line-Entry System (SMILES) data, 11,958 GO terms, 7,238 diseases, and 25,482 proteins, followed by local embedding vectors using an end-to-end transformer network and a node2vec algorithm and global embedding vectors learned from heterogeneous network via the graph neural network. Our algorithm conferred a good distinction between potential compounds, presenting as higher prediction scores for the compound categories with a higher potential but lower scores for other categories. Probability score-dependent selection revealed that antioxidants such as sulforaphane and daidzein were potentially effective compounds against VC, while catechin had low probability. All three compounds were validated in vitro. Conclusions. Our findings exemplify the utility of deep learning in identifying promising VC-treating plant compounds. Our model can be a quick and comprehensive computational screening tool to assist in the early drug discovery process.

show abstract

Section: Methodsmentioning

confidence: 99%

Deep Learning-Assisted Repurposing of Plant Compounds for Treating Vascular Calcification: An In Silico Study with Experimental Validation

Chao

Tsai

Lee

et al. 2022

Oxidative Medicine and Cellular Longevity

View full text Add to dashboard Cite

show abstract

“…• drugs MTE : 512-dimensional Molecular Transformer Embeddings (MTEs) [47], fed into fully connected drug subnetworks.…”

Section: Testing the Impact Of Different Methodological Variablesmentioning

confidence: 99%

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

Baptista

Ferreira

Rocha

2022

Preprint

View full text Add to dashboard Cite

One of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact on performance. Drug features appeared to be more predictive of drug response. Molecular fingerprint-based drug representations performed slightly better than learned representations, and gene expression data of cancer or drug response-specific genes also improved performance. In general, fully connected feature-encoding subnetworks outperformed other architectures, with DL outperforming other ML methods. Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.Author summaryCancer therapies often fail because tumor cells become resistant to treatment. One way to overcome resistance is by treating patients with a combination of two or more drugs. Some combinations may be more effective than when considering individual drug effects, a phenomenon called drug synergy. Computational drug synergy prediction methods can help to identify new, clinically relevant drug combinations. In this study, we developed several deep learning models for drug synergy prediction. We examined the effect of using different types of deep learning architectures, and different ways of representing drugs and cancer cell lines. We explored the use of biological prior knowledge to select relevant cell line features, and also tested data-driven feature reduction methods. We tested both precomputed drug features and deep learning methods that can directly learn features from raw representations of molecules. We also evaluated whether including genomic features, in addition to gene expression data, improves the predictive performance of the models. Through these experiments, we were able to identify strategies that will help guide the development of new deep learning models for drug synergy prediction in the future.

show abstract

“…To our best knowledge, it is one of the first attempts to utilize Transformer-like models as sole predictors of the binding affinity. Worth mentioning here a recent paper by Morris et al [27] adopting Transformer approach for the affinity prediction, however, their setup is limited to single receptor task, thus embeddings are learned for ligand SMILES only.…”

Section: Methodsmentioning

confidence: 99%

High throughput screening with machine learning

Gurbych,

Druchok,

Yarish

et al. 2020

Preprint

View full text Add to dashboard Cite

This study assesses the efficiency of several popular machine learning approaches in the prediction of molecular binding affinity: CatBoost, Graph Attention Neural Network, and Bidirectional Encoder Representations from Transformers. The models were trained to predict binding affinities in terms of inhibition constants K i for pairs of proteins and small organic molecules. First two approaches use thoroughly selected physico-chemical features, while the third one is based on textual molecular representations -it is one of the first attempts to apply Transformer-based predictors for the binding affinity. We also discuss the visualization of attention layers within the Transformer approach in order to highlight the molecular sites responsible for interactions. All approaches are free from atomic spatial coordinates thus avoiding bias from known structures and being able to generalize for compounds with unknown conformations. The achieved accuracy for all suggested approaches prove their potential in high throughput screening.

show abstract

Predicting Binding from Screening Assays with Transformer Network Embeddings

Cited by 34 publications

References 60 publications

Deep Learning-Assisted Repurposing of Plant Compounds for Treating Vascular Calcification: An In Silico Study with Experimental Validation

Deep Learning-Assisted Repurposing of Plant Compounds for Treating Vascular Calcification: An In Silico Study with Experimental Validation

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

High throughput screening with machine learning

Contact Info

Product

Resources

About