Machine learning classification can reduce false positives in structure-based virtual screening

Adeshina, Yusuf; Deeds, Eric J.; Karanicolas, John

doi:10.1101/2020.01.10.902411

Cited by 12 publications

(32 citation statements)

References 105 publications

(134 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The D-COID dataset is available at https://data.mendeley.com/datasets/8czn4rxz68/ ( 93 ). vScreenML is available at https://github.com/karanicolaslab/vScreenML .…”

Section: Methodsmentioning

confidence: 99%

Machine learning classification can reduce false positives in structure-based virtual screening

Adeshina

Deeds

Karanicolas

2020

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

154

120

View full text Add to dashboard Cite

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to Ki of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.

show abstract

“…The D-COID dataset is available at https://data.mendeley.com/datasets/8czn4rxz68/ ( 93 ). vScreenML is available at https://github.com/karanicolaslab/vScreenML .…”

Section: Methodsmentioning

confidence: 99%

Machine learning classification can reduce false positives in structure-based virtual screening

Adeshina

Deeds

Karanicolas

2020

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

154

120

View full text Add to dashboard Cite

show abstract

“…AI can find new molecular compounds and emerging drug targets much faster than traditional methods, thus speeding up the progress of drug development [184,185]. At the same time, AI can more accurately predict the follow-up experimental results of new drugs, so as to improve the accuracy at each stage of drug development [186]. Computer-aided drug design techniques are thus revolutionizing MSCs therapies.…”

Section: Artificial Intelligence (Ai) In Msc Treatmentmentioning

confidence: 99%

Challenges and advances in clinical applications of mesenchymal stromal cells

et al. 2021

View full text Add to dashboard Cite

Mesenchymal stromal cells (MSCs), also known as mesenchymal stem cells, have been intensely investigated for clinical applications within the last decades. However, the majority of registered clinical trials applying MSC therapy for diverse human diseases have fallen short of expectations, despite the encouraging pre-clinical outcomes in varied animal disease models. This can be attributable to inconsistent criteria for MSCs identity across studies and their inherited heterogeneity. Nowadays, with the emergence of advanced biological techniques and substantial improvements in bio-engineered materials, strategies have been developed to overcome clinical challenges in MSC application. Here in this review, we will discuss the major challenges of MSC therapies in clinical application, the factors impacting the diversity of MSCs, the potential approaches that modify MSC products with the highest therapeutic potential, and finally the usage of MSCs for COVID-19 pandemic disease.

show abstract

“…Adeshina et al 35 presented an unusually comprehensive study. Not only they investigated a new ML‐based SF (vScreenML), they also presented a new benchmark (D‐COID) and a prospective application of vScreenML, the two latter parts analyzed elsewhere in this review.…”

Section: Ml‐based Scoring Functions For Sbvsmentioning

confidence: 99%

“…Other ways to make benchmarks more realistic have been proposed such as considering true instead of assumed inactives 110 or selecting decoys that are 3D similar to their actives. 35 Benchmarks should be compared to selecting these assumed inactives at random and ultimately assessing how well these anticipate prospective performance. The latter was done by Sun et al, 52 who…”

Section: What Are the Limitations Of Commonly Used Retrospective Bencmentioning

confidence: 99%

“…So how can we effectively anticipate which SFs would work best on a given target? Evaluating the SFs on a set of actives and their decoys selected in a different manner than those employed in the training set is a way to ensure that we are not exploiting the way these molecules were selected 35,37,63,77,78 . In addition, some authors have proposed ways to make benchmarks more realistic by considering true instead of assumed inactives 110 or selecting decoys that are 3D similar to their actives 35 .…”

Section: Benchmarking Scoring Functions For Sbvsmentioning

confidence: 99%

See 1 more Smart Citation

Machine‐learning scoring functions for structure‐based virtual screening

Sze

Lü

et al. 2020

WIREs Comput Mol Sci

135

View full text Add to dashboard Cite

Molecular docking predicts whether and how small molecules bind to a macromolecular target using a suitable 3D structure. Scoring functions for structure-based virtual screening primarily aim at discovering which molecules bind to the considered target when these form part of a library with a much higher proportion of non-binders. Classical scoring functions are essentially models building a linear mapping between the features describing a proteinligand complex and its binding label. Machine learning, a major subfield of artificial intelligence, can also be used to build fast supervised learning models for this task. In this review, we analyzed such machine-learning scoring functions for structure-based virtual screening in the period 2015-2019. We have discussed what the shortcomings of current benchmarks really mean and what valid alternatives have been employed. The latter retrospective studies observed that machine-learning scoring functions were substantially more accurate, in terms of higher hit rates and potencies, than the classical scoring functions they were compared to. Several of these machine-learning scoring functions were also employed in prospective studies, in which mid-nanomolar binders with novel chemical structures were directly discovered without any potency optimization. We have thus highlighted the codes and webservers that are available to build or apply machine-learning scoring functions to prospective structure-based virtual screening studies. A discussion of prospects for future work completes this review.

show abstract

Machine learning classification can reduce false positives in structure-based virtual screening

Cited by 12 publications

References 105 publications

Machine learning classification can reduce false positives in structure-based virtual screening

Machine learning classification can reduce false positives in structure-based virtual screening

Challenges and advances in clinical applications of mesenchymal stromal cells

Machine‐learning scoring functions for structure‐based virtual screening

Contact Info

Product

Resources

About