Chemogenomics, also known as proteochemometrics, covers various computational methods for predicting interactions between related drugs and targets on large-scale data. Chemogenomics is used in the early stages of drug discovery to predict the off-target effects of proteins against therapeutic candidates. This study aims to predict unknown ligand–target interactions using one-dimensional SMILES as inputs for ligands and binding site residues for proteins in a computationally efficient manner. We first formulate a Deep learning CNN model using one-dimensional SMILES for drugs and motif-rich binding pocket subsequences of proteins as inputs. We evaluate and compare the proposed deep learning model trained on expert-based features against shallow feature-based machine learning methods. The proposed method achieved better or similar performance on the MSE and AUPR metrics than the shallow methods. Additionally, We show that our deep learning model, DeepPS is computationally more efficient than the deep learning model trained on full-length raw sequences of proteins. We conclude that a beneficial research approach would be to integrate structural information of proteins for modeling drug-target interaction prediction of large datasets for more interpretability, high throughput, and broad applicability.
Graphical abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.