Motivation
Identifying compound-protein interaction (CPI) is a crucial task in drug discovery and chemogenomics studies, and proteins without three-dimensional (3D) structure account for a large part of potential biological targets, which requires developing methods using only protein sequence information to predict CPI. However, sequence-based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias, and splitting datasets inappropriately, resulting in overestimation of their prediction performance.
Results
To address these issues, we here constructed new datasets specific for CPI prediction, proposed a novel transformer neural network named TransformerCPI, and introduced a more rigorous label reversal experiment to test whether a model learns true interaction features. TransformerCPI achieved much improved performance on the new experiments, and it can be deconvolved to highlight important interacting regions of protein sequences and compound atoms, which may contribute chemical biology studies with useful guidance for further ligand structural optimization.
Supplementary information
Supplementary data are available at Bioinformatics online.
Availability and implementation
https://github.com/lifanchen-simm/transformerCPI
Alterations of discoidin domain receptor1 (DDR1) may lead to increased production of inflammatory cytokines, making DDR1 an attractive target for inflammatory bowel disease (IBD) therapy. A scaffold-based molecular design workflow was established and performed by integrating a deep generative model, kinase selectivity screening and molecular docking, leading to a novel DDR1 inhibitor compound 2, which showed potent DDR1 inhibition profile (IC 50 = 10.6 ± 1.9 nM) and excellent selectivity against a panel of 430 kinases (S (10) = 0.002 at 0.1 μM). Compound 2 potently inhibited the expression of pro-inflammatory cytokines and DDR1 autophosphorylation in cells, and it also demonstrated promising oral therapeutic effect in a dextran sulfate sodium (DSS)-induced mouse colitis model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.