Protein–ligand scoring functions
are widely used in structure-based
drug design for fast evaluation of protein–ligand interactions,
and it is of strong interest to develop scoring functions with machine-learning
approaches. In this work, by expanding the training set, developing
physically meaningful features, employing our recently developed linear
empirical scoring function Lin_F9 (YangC.
Yang, C.
J. Chem. Inf. Model.20216146304644) as the
baseline, and applying extreme gradient boosting (XGBoost) with Δ-machine
learning, we have further improved the robustness and applicability
of machine-learning scoring functions. Besides the top performances
for scoring-ranking-screening power tests of the CASF-2016 benchmark,
the new scoring function ΔLin_F9XGB also achieves
superior scoring and ranking performances in different structure types
that mimic real docking applications. The scoring powers of ΔLin_F9XGB for locally optimized poses, flexible redocked poses,
and ensemble docked poses of the CASF-2016 core set achieve Pearson’s
correlation coefficient (R) values of 0.853, 0.839,
and 0.813, respectively. In addition, the large-scale docking-based
virtual screening test on the LIT-PCBA data set demonstrates the reliability
and robustness of ΔLin_F9XGB in virtual screening
application. The ΔLin_F9XGB scoring function and
its code are freely available on the web at ().