Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets

Li, Hongjian; Leung, Kwong‐Sak; Wong, Man‐Hon; Ballester, Pedro J.

doi:10.1002/minf.201400132

Cited by 217 publications

(264 citation statements)

References 69 publications

Supporting

Mentioning

258

Contrasting

Order By: Relevance

“…Many of them are provided in a way that does not permit changing the regression model, although a number of control parameters can be adjusted to tailor the SF to a particular target. Importantly, the underlying linear regression model employed by classical SFs has been shown to be unable to assimilate large amounts of structural and binding data12.…”

mentioning

confidence: 99%

“…Indeed, the degree with which machine-learning SFs have outperformed classical SFs at binding affinity prediction has been highlighted by several reviews13181920. Research has been carried out on various aspects of machine-learning SFs for binding affinity prediction: how target diversity affects predictive performance21, the impact of structure-based feature selection on predictive performance22, how to build machine-learning versions of classical SFs23, how predictive performance increases with the size of the training data in both types of SFs12, how the quality of structural and binding data influences predictive performance24, which machine learning (ML) methods generate more predictive SFs25, how to correct the impact of docking pose generation error on predictive performance26 or the implementation of webservers27 and stand-alone software2628 to make these tools freely available. It is important to note that the validation of machine-learning SFs has generally been much more rigorous than that of most classical SFs13.…”

mentioning

confidence: 99%

“…It is important to note that the validation of machine-learning SFs has generally been much more rigorous than that of most classical SFs13. For example, in building RF-Score v3 for binding affinity prediction, no overlapping between training and test sets is permitted by construction12. Importantly, any adjustable parameter of the machine-learning SF is selected from data not used to estimate the performance of the model13 (e.g.…”

mentioning

confidence: 99%

“…We use the full DUD-E5 data sets for model building and performance assessment across 102 targets using three docking tools to generate the corresponding poses. Three machine-learning SFs using structural features with different degrees of complexity are used121422 and compared to five classical SFs. We assess the VS performance of the SFs in both established-target and novel-target settings, either tailored for broad application or for a specific target.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Performance of machine-learning scoring functions in structure-based virtual screening

Wójcikowski

Ballester

Siedlecki

2017

Sci Rep

Self Cite

308

View full text Add to dashboard Cite

Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Performance of machine-learning scoring functions in structure-based virtual screening

Wójcikowski

Ballester

Siedlecki

2017

Sci Rep

Self Cite

308

View full text Add to dashboard Cite

show abstract

“…In docking method, the structures are evaluated on the basis of a force field or a scoring function 8 . It predicts the preferred conformations and binding strength of a ligand molecule, typically a small organic molecule, as bound to a protein pocket 9 . Docking provides a reasonable accuracy in predicting DTI when 3D structure of protein and large quantities of data are present 10 .…”

Section: Introductionmentioning

confidence: 99%

Pred-binding: large-scale protein–ligand binding affinity prediction

Shar

Tao

Gao

et al. 2016

Journal of Enzyme Inhibition and Medicinal Chemistry

View full text Add to dashboard Cite

Drug target interactions (DTIs) are crucial in pharmacology and drug discovery. Presently, experimental determination of compound-protein interactions remains challenging because of funding investment and difficulties of purifying proteins. In this study, we proposed two in silico models based on support vector machine (SVM) and random forest (RF), using 1589 molecular descriptors and 1080 protein descriptors in 9948 ligand-protein pairs to predict DTIs that were quantified by K i values. The cross-validation coefficient of determination of 0.6079 for SVM and 0.6267 for RF were obtained, respectively. In addition, the two-dimensional (2D) autocorrelation, topological charge indices and three-dimensional (3D)-MoRSE descriptors of compounds, the autocorrelation descriptors and the amphiphilic pseudo-amino acid composition of protein are found most important for K i predictions. These models provide a new opportunity for the prediction of ligand-receptor interactions that will facilitate the target discovery and toxicity evaluation in drug development.

show abstract