A review on compound-protein interaction prediction methods: Data, format, representation and model

Lim, Sangsoo; Lu, Yijingxiu; Cho, Chang Yun; Sung, Inyoung; Kim, Jung-Woo; Kim, Young-Kuk; Park, Sungjoon; Kim, Sun

doi:10.1016/j.csbj.2021.03.004

Cited by 88 publications

(64 citation statements)

References 145 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast to ligand-based approaches, structure-based methods (Lim et al, 2021) usually take structures of protein targets and/or protein-ligand complexes as inputs for affinity prediction. Some work (Wallach et al, 2015;Li et al, 2021b) predicts the binding affinity from experimentally determined proteinligand co-crystal structures, but such data is highly expensive and time-consuming to obtain in practice.…”

Section: Structure-based Affinity Prediction (Sbap)mentioning

confidence: 99%

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

Ji¹,

Zhang²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-ofdistribution (OOD) learning problem with noise, which is inevitable in real world AIDD applications.In this work, we present DrugOOD 1 , a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

show abstract

Section: Structure-based Affinity Prediction (Sbap)mentioning

confidence: 99%

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

Ji¹,

Zhang²,

Wu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The latter two techniques are involved as standard, classical, well-known techniques mainly for benchmarking; on the other hand, SVM, tree-based algorithms and neural networks are trending now in all aspects of data science. ML algorithms are routinely used in (i) bioactivity [5], as well as property predictions of drug related compounds [6]; (ii) de novo drug design, i.e., generation of new chemical structures of practical interest [7]; (iii) virtual screening [8]; (iv) prediction of reaction pathways [9] and v) compound-protein interactions [10], etc. ML algorithms are mainly aimed at prediction, for which a great selection of descriptors and chemical representations, as well as many ML algorithms can be combined [11].…”

Section: Introductionmentioning

confidence: 99%

Machine learning models for classification tasks related to drug safety

et al. 2021

View full text Add to dashboard Cite

In this review, we outline the current trends in the field of machine learning-driven classification studies related to ADME (absorption, distribution, metabolism and excretion) and toxicity endpoints from the past six years (2015–2021). The study focuses only on classification models with large datasets (i.e. more than a thousand compounds). A comprehensive literature search and meta-analysis was carried out for nine different targets: hERG-mediated cardiotoxicity, blood–brain barrier penetration, permeability glycoprotein (P-gp) substrate/inhibitor, cytochrome P450 enzyme family, acute oral toxicity, mutagenicity, carcinogenicity, respiratory toxicity and irritation/corrosion. The comparison of the best classification models was targeted to reveal the differences between machine learning algorithms and modeling types, endpoint-specific performances, dataset sizes and the different validation protocols. Based on the evaluation of the data, we can say that tree-based algorithms are (still) dominating the field, with consensus modeling being an increasing trend in drug safety predictions. Although one can already find classification models with great performances to hERG-mediated cardiotoxicity and the isoenzymes of the cytochrome P450 enzyme family, these targets are still central to ADMET-related research efforts. Graphical abstract

show abstract

“…Recently, Lim’s team [ 41 ] published a review paper on compound protein interaction (CPI) prediction models that includes a precise description of the data format used, the techniques associated with model development and emerging methods. They also provide an overview of databases as chemistry-centric, protein-centric and integrated database and analyzed the diversified methods of AI like, tree, neural network, kernel and graph-based methods in the field of CPI.…”

Section: Introduction To Protein–ligand Interactionsmentioning

confidence: 99%

Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions

Dhakal

McKay

Tanner

et al. 2021

Briefings in Bioinformatics

136

View full text Add to dashboard Cite

New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein–ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein–ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein–ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein–ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein–ligand interactions.

show abstract

A review on compound-protein interaction prediction methods: Data, format, representation and model

Cited by 88 publications

References 145 publications

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations

Machine learning models for classification tasks related to drug safety

Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions

Contact Info

Product

Resources

About