Ensembling machine learning models to boost molecular affinity prediction

Druchok, Maksym; Yarish, Dzvenymyra; Garkot, Sofiya; Nikolaienko, Tymofii Yu.; Gurbych, Oleksandr

doi:10.1016/j.compbiolchem.2021.107529

Cited by 15 publications

(13 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This ensemble approach has been shown multiple times to improve predictions of machine learning and deep learning models (Hansen and Salamon 1990;Ashtawy and Mahapatra 2015;Ericksen et al, 2017;Francoeur et al, 2020;Kwon et al, 2020;Meli et al, 2021). More generally, a consensus score amongst multiple models (also with different architectures) can be used as well (Druchok et al, 2021), and the average between different models (different architectures and/or different training data sets) has been shown to improve pose predictions with CNN scoring functions (McNutt et al, 2021). While the average across different models is often used to estimate the performance of the ensemble, the standard deviation across predictions gives information about their stability and can be used as a diagnostic tool.…”

Section: Discussionmentioning

confidence: 99%

Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review

2022

View full text Add to dashboard Cite

The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.

show abstract

Section: Discussionmentioning

confidence: 99%

Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review

2022

View full text Add to dashboard Cite

show abstract

“…However, the proposed method successfully avoids the overfitting, which can be deduced from the fact that the trained model gives accurate capacity estimation for cell #8 despite the training data includes abnormal capacity drop of cell #5. This is also beneficial from the inherent voting mechanism under ensembling framework, which makes the trained model prefer the majority rather than focusing on the outlier [43].…”

Section: E More Discussionmentioning

confidence: 99%

A Novel Capacity Estimation Method for Li-Ion Battery Cell by Applying Ensemble Learning to Extremely Sparse Significant Points

et al. 2022

View full text Add to dashboard Cite

Accurate capacity estimation is important for safe operation of battery. Existing advanced researches heavily rely on feature engineering to model capacity degradation, where features are difficult to design and extract. In this paper, a novel purely data-driven capacity estimation method is proposed by applying ensemble learning to extremely sparse significant points on voltage and/or temperature curve. The significant points are just raw points evenly distributed throughout the charging/discharging process, or points corresponding to specific SoC, which is easy to extract without complicated feature engineering process on raw data. A novel ensemble learning framework incorporating light gradient boosting decision tree (LightGBM) and neural network is employed to find the regression relationship between significant points and battery capacity. Public battery dataset collected by Oxford university is used to verify the effectiveness of the proposed method. Results show that for Oxford dataset, the maximum and mean capacity estimation error could be controlled within 1.25% and 0.5% respectively, which is superior to most of existing capacity estimation methods. Moreover, robustness, generalization ability and model application are well discussed.INDEX TERMS Capacity estimation; ensemble learning; Data-driven; Significant points.

show abstract

“…Another two‐stage ensembling pipeline is suggested in Ref. [23], uniting six ML techniques to enhance the DTA prediction. The methods include Support Vector Machine, Random Forest, CatBoost, 24 feed‐forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers 25 …”

Section: Introductionmentioning

confidence: 99%

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network

2022

Self Cite

View full text Add to dashboard Cite

Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity.The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessmentsix different strategies for test/train data partitioning (random, time-and propertyarranged, protein-and ligand-clustered) with a k-fold cross-validation are engaged.

show abstract

Ensembling machine learning models to boost molecular affinity prediction

Cited by 15 publications

References 57 publications

Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review

Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review

A Novel Capacity Estimation Method for Li-Ion Battery Cell by Applying Ensemble Learning to Extremely Sparse Significant Points

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network

Contact Info

Product

Resources

About