2023
DOI: 10.1021/acsomega.2c06781
|View full text |Cite
|
Sign up to set email alerts
|

Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets

Abstract: Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein−ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein− ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
14
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(16 citation statements)
references
References 36 publications
2
14
0
Order By: Relevance
“…In chemistry, random splitting has been shown to possess limitations in confirming the data generalizability 92,93 . Specifically, in the context of predicting PLI, where proteins and ligands and their associated relationships significantly affect results, using random splitting for data construction can introduce severe biases 57,94 . In the case of a refined (general)‐core splitting strategy, a model is trained with a refined or general set of PDBbind and tested on a high‐quality core set.…”
Section: Evaluating Generalizability Of Structure‐based Pli Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…In chemistry, random splitting has been shown to possess limitations in confirming the data generalizability 92,93 . Specifically, in the context of predicting PLI, where proteins and ligands and their associated relationships significantly affect results, using random splitting for data construction can introduce severe biases 57,94 . In the case of a refined (general)‐core splitting strategy, a model is trained with a refined or general set of PDBbind and tested on a high‐quality core set.…”
Section: Evaluating Generalizability Of Structure‐based Pli Modelsmentioning
confidence: 99%
“…92,93 Specifically, in the context of predicting PLI, where proteins and ligands and their associated relationships significantly affect results, using random splitting for data construction can introduce severe biases. 57,94 In the case of a refined (general)-core splitting strategy, a model is trained with a refined or general set of PDBbind and tested on a high-quality core set. All models that used the PDBbind dataset for training and the CASF-2016 benchmark dataset 20 for testing correspond to the refined-core splitting.…”
Section: Data Splitting Strategiesmentioning
confidence: 99%
“…16 Furthermore, existing datasets such as PDBbind 17,18 commonly used for model development have been criticized as biased because similar prediction accuracies were achieved irrespective of whether the whole protein-ligand complex, only the protein, or only the ligand was considered. 2,19,20 This potential bias in dataset composition makes it challenging to meaningfully assess and compare model performance. 2,16,19,20 As a potential remedy to the challenge of learning underlying dataset biases rather than meaningfully capturing physical interactions, Rognan and coworkers suggested to consider "only noncovalent interactions while omitting their protein and ligand atomic environments".…”
Section: Introductionmentioning
confidence: 99%
“…2,19,20 This potential bias in dataset composition makes it challenging to meaningfully assess and compare model performance. 2,16,19,20 As a potential remedy to the challenge of learning underlying dataset biases rather than meaningfully capturing physical interactions, Rognan and coworkers suggested to consider "only noncovalent interactions while omitting their protein and ligand atomic environments". 2 In addition to geometric deep learning approaches, 5,23-26 also simulation-based approaches (e.g., free energy perturbation, 27,28 MM/PBSA [29][30][31] ) are frequently used for binding affinity prediction.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation