Discovery
and optimization of small molecule inhibitors as therapeutic
drugs have immensely benefited from rational structure-based drug
design. With recent advances in high-resolution structure determination,
computational power, and machine learning methodology, it is becoming
more tractable to elucidate the structural basis of drug potency. However, the applicability of machine
learning models to drug design is limited by the interpretability
of the resulting models in terms of feature importance. Here, we take
advantage of the large number of available inhibitor-bound HIV-1 protease
structures and associated potencies to evaluate inhibitor diversity
and machine learning models to predict ligand affinity. First, using
a hierarchical clustering approach, we grouped HIV-1 protease inhibitors
and identified distinct core structures. Explicit features including
protein–ligand interactions were extracted from high-resolution
cocrystal structures as 3D-based fingerprints. We found that a gradient
boosting machine learning model with this explicit feature attribution
can predict binding affinity with high accuracy. Finally, Shapley
values were derived to explain local feature importance. We found
specific van der Waals (vdW) interactions of key protein residues
are pivotal for the predicted potency. Protein-specific and interpretable
prediction models can guide the optimization of many small molecule
drugs for improved potency.