Machine learning-based drug discovery success depends
on molecular
representation. Yet traditional molecular fingerprints omit both the
protein and pointers back to structural information that would enable
better model interpretability. Therefore, we propose LUNA, a Python
3 toolkit that calculates and encodes protein–ligand interactions
into new hashed fingerprints inspired by Extended Connectivity FingerPrint
(ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional
Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP).
LUNA also provides visual strategies to make the fingerprints interpretable.
We performed three major experiments exploring the fingerprints’
use. First, we trained machine learning models to reproduce DOCK3.7
scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R
2 = 0.61)
superior to related molecular and interaction fingerprints. Second,
we used LUNA to support interpretable machine learning models. Finally,
we demonstrate that interaction fingerprints can accurately identify
similarities across molecular complexes that other fingerprints overlook.
Hence, we envision LUNA and its interface fingerprints as promising
methods for machine learning-based virtual screening campaigns. LUNA
is freely available at .
Machine learning-based drug discovery success depends on molecular representation. Yet traditional molecular fingerprints omit both the protein and pointers back to structural information that would enable better model interpretability. Therefore, we propose LUNA, a Python 3 toolkit that calculates and encodes protein-ligand interactions into new hashed fingerprints inspired by Extended Connectivity Finger-Print (ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP). LUNA also provides visual strategies to make the fingerprints interpretable. We performed three major experiments exploring the fingerprints’ use. First, we trained machine learning models to reproduce DOCK3.7 scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R2 = 0.61) superior to related molecular and interaction fingerprints. Secondly, we used LUNA to support interpretable machine learning models. Finally, we demonstrate that interaction fingerprints can accurately identify similarities across molecular complexes that other fingerprints over-look. Hence, we envision LUNA and its interface fingerprints as promising methods for machine learning-based virtual screening campaigns. LUNA is freely available at https://github.com/keiserlab/LUNA.
<p>Fig.S1 EIF3H promotes EMT and tumor invasion. Fig.S2 Clinico-pathological correlations of EIF3H. Fig.S3 Lys63-linked tetra-ubiquitin in-vitro cleavage assay. Fig.S4 GST Pull-down assay. Fig.S5 EIF3H binds to YAP1 and stabilizes YAP1. Fig.S6 EIF3H acts as a deubiquitinase that stabilizes YAP from ubiquitin-dependent degradation. Fig. S7 Molecular docking analysis. Fig. S8 EIF3H acts as a deubiquitinase that stabilizes YAP and promotes epithelial-mesenchymal transition. Fig. S9 Stabilization of YAP by EIF3H promotes breast cancer invasion and metastasis. Fig. S10 Blockade of EIF3H phosphorylation and YAP expression inhibit tumor cells invasion. Fig. S11 Stabilization of YAP by EIF3H promotes breast cancer invasion and metastasis.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.