Background:
Drug discovery is a complex and expensive procedure involving several
timely and costly phases through which new potential pharmaceutical compounds must pass to get
approved. One of these critical steps is the identification and optimization of lead compounds,
which has been made more accessible by the introduction of computational methods, including
deep learning (DL) techniques. Diverse DL model architectures have been put forward to learn the
vast landscape of interaction between proteins and ligands and predict their affinity, helping in the
identification of lead compounds.
Objective:
This survey fills a gap in previous research by comprehensively analyzing the most
commonly used datasets and discussing their quality and limitations. It also offers a comprehensive
classification of the most recent DL methods in the context of protein-ligand binding affinity
prediction, providing a fresh perspective on this evolving field.
Methods:
We thoroughly examine commonly used datasets for BAP and their inherent characteristics.
Our exploration extends to various preprocessing steps and DL techniques, including graph
neural networks, convolutional neural networks, and transformers, which are found in the literature.
We conducted extensive literature research to ensure that the most recent deep learning approaches
for BAP were included by the time of writing this manuscript.
Results:
The systematic approach used for the present study highlighted inherent challenges to
BAP via DL, such as data quality, model interpretability, and explainability, and proposed considerations
for future research directions. We present valuable insights to accelerate the development
of more effective and reliable DL models for BAP within the research community.
Conclusion:
The present study can considerably enhance future research on predicting affinity between
protein and ligand molecules, hence further improving the overall drug development process.