Improved method of structure-based virtual screening based on ensemble learning

Li, Jin; Liu, Weichao; Song, Yongping; Xia, JiYi

doi:10.1039/c9ra09211k

Cited by 12 publications

(11 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to evaluating this default ensemble, we also show results for the General ensemble, which combines the simplest model, Default2018, with the smallest training set, redocked poses from the 2016 PDBbind General set, and the Dense ensemble, which combines the largest model with the largest training set, CrossDocked2020 [ 45 ]. The variations in architecture and training data allow us to compare the effects of these aspects of the CNN scoring functions on virtual screening performance, while the ensembles themselves are expected to improve average predictive accuracy by reducing the effects of bias from individual learners [ 47 ] and in theory allow us to approximate the uncertainty in our predictions [ 48 , 49 ].…”

Section: Methodsmentioning

confidence: 99%

Virtual Screening with Gnina 1.0

Sunseri

Koes

2021

Molecules

View full text Add to dashboard Cite

Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

show abstract

Section: Methodsmentioning

confidence: 99%

Virtual Screening with Gnina 1.0

Sunseri

Koes

2021

Molecules

View full text Add to dashboard Cite

show abstract

“…In addition to evaluating this default ensemble, we also show results for the General ensemble, which combines the simplest model, Default2018, with the smallest training set, redocked poses from the 2016 PDBbind General set, and the Dense ensemble, which combines the largest model with the largest training set, CrossDocked2020 [44]. The variations in architecture and training data allow us to compare the effects of these aspects of the CNN scoring functions on virtual screening performance, while the ensembles themselves are expected to improve average predictive accuracy by reducing the effects of bias from individual learners [47] and in theory allow us to approximate the uncertainty in our predictions [48,49].…”

Section: Modelsmentioning

confidence: 99%

Virtual Screening with Gnina 1.0

Sunseri¹,

Koes²

2021

Preprint

View full text Add to dashboard Cite

Virtual screening - predicting which compounds within a specified compound library bind to a target molecule, typically a protein - is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

show abstract

“…However, cross-docking comparative studies, more indicative of a real prospective study than self-docking, are much less common and indicate significantly lower pose prediction accuracy than self-docking. In addition, retrospective virtual screening studies rarely demonstrate area under the receiver operating curve (AUROC) values exceeding 0.8, although machine learning techniques have been proposed to improve this accuracy. , Among the challenges noted are the large chemical diversity in ligands, numerous classes of proteins with varying structural features, as well as biases in available testing sets. , …”

Section: Introductionmentioning

confidence: 99%

Docking Ligands into Flexible and Solvated Macromolecules. 8. Forming New Bonds─Challenges and Opportunities

Labarre

Stille

Patrascu

et al. 2022

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Over the years, structure-based design programs and specifically docking small molecules to proteins have become prominent in drug discovery. However, many of these computational tools have been developed to primarily dock enzyme inhibitors (and ligands to other protein classes) relying heavily on hydrogen bonds and electrostatic and hydrophobic interactions. In reality, many drug targets either feature metal ions, can be targeted covalently, or are simply not even proteins (e.g., nucleic acids). Herein, we describe several new features that we have implemented into FITTED to broaden its applicability to a wide range of covalent enzyme inhibitors and to metalloenzymes, where metal coordination is essential for drug binding. This updated version of our docking program was tested for its ability to predict the correct binding mode of drug-sized molecules in a large variety of proteins. We also report new datasets that were essential to demonstrate areas of success and those where additional efforts are required. This resource could be used by other program developers to assess their own software.

show abstract

Improved method of structure-based virtual screening based on ensemble learning

Abstract: Virtual screening has become a successful alternative and complementary technique to experimental high-throughput screening technologies for drug design. This paper proposed a target-specific virtual screening method based on ensemble learning named ENS-VS.

Cited by 12 publications

References 46 publications

Virtual Screening with Gnina 1.0

Virtual Screening with Gnina 1.0

Virtual Screening with Gnina 1.0

Docking Ligands into Flexible and Solvated Macromolecules. 8. Forming New Bonds─Challenges and Opportunities

Contact Info

Product

Resources

About