Accurate prediction
of protein–ligand binding free energies
is important in enzyme engineering and drug discovery. The molecular
mechanics/generalized Born surface area (MM/GBSA) approach is widely
used to estimate ligand-binding affinities, but its performance heavily
relies on the accuracy of its energy components. A hybrid strategy
combining MM/GBSA and machine learning (ML) has been developed to
predict the binding free energies of protein–ligand systems.
Based on the MM/GBSA energy terms and several features associated
with protein–ligand interactions, our ML-based scoring function,
GXLE, shows much better performance than MM/GBSA without entropy.
In particular, the good transferability of the GXLE model is highlighted
by its good performance in ranking power for prediction of the binding
affinity of different ligands for either the docked structures or
crystal structures. The GXLE scoring function and its code are freely
available and can be used to correct the binding free energies computed
by MM/GBSA.
Water molecules at the ligand–protein interfaces
play crucial
roles in the binding of the ligands, but the behavior of protein-bound
water is largely ignored in many currently used machine learning (ML)-based
scoring functions (SFs). In an attempt to improve the prediction performance
of existing ML-based SFs, we estimated the water distribution with
a HydraMap (HM) method and then incorporated the features extracted
from protein-bound waters obtained in this way into three ML-based
SFs: RF-Score, ECIF, and PLEC. It was found that a combination of
HM-based features can consistently improve the performance of all
three SFs, including their scoring, ranking, and docking power. HydraMap-based
features show consistently good performance with both crystal structures
and docked structures, demonstrating their robustness for SFs. Overall,
HM-based features, which are a statistical representation of hydration
sites at protein–ligand interfaces, are expected to improve
the prediction performance for diverse SFs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.