Predicting the protein–nucleic acid (PNA) binding
affinity
solely from their sequences is of paramount importance for the experimental
design and analysis of PNA interactions (PNAIs). A large number of
currently developed models for binding affinity prediction are limited
to specific PNAIs while also relying on the sequence and structural
information of the PNA complexes for both training and testing, and
also as inputs. As the PNA complex structures available are scarce,
this significantly limits the diversity and generalizability due to
the small training data set. Additionally, a majority of the tools
predict a single parameter, such as binding affinity or free energy
changes upon mutations, rendering a model less versatile for usage.
Hence, we propose DeePNAP, a machine learning-based model built from
a vast and heterogeneous data set with 14,401 entries (from both eukaryotes
and prokaryotes) from the ProNAB database, consisting of wild-type
and mutant PNA complex binding parameters. Our model precisely predicts
the binding affinity and free energy changes due to the mutation(s)
of PNAIs exclusively from their sequences. While other similar tools
extract features from both sequence and structure information, DeePNAP
employs sequence-based features to yield high correlation coefficients
between the predicted and experimental values with low root mean squared
errors for PNA complexes in predicting K
D and ΔΔG, implying the generalizability
of DeePNAP. Additionally, we have also developed a web interface hosting
DeePNAP that can serve as a powerful tool to rapidly predict binding
affinities for a myriad of PNAIs with high precision toward developing
a deeper understanding of their implications in various biological
systems. Web interface: