The identification of protein–ligand interaction plays a key role in biochemical research and drug discovery. Although deep learning has recently shown great promise in discovering new drugs, there remains a gap between deep learning-based and experimental approaches. Here, we propose a novel framework, named AIMEE, integrating AI model and enzymological experiments, to identify inhibitors against 3CL protease of SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2), which has taken a significant toll on people across the globe. From a bioactive chemical library, we have conducted two rounds of experiments and identified six novel inhibitors with a hit rate of 29.41%, and four of them showed an IC50 value <3 μM. Moreover, we explored the interpretability of the central model in AIMEE, mapping the deep learning extracted features to the domain knowledge of chemical properties. Based on this knowledge, a commercially available compound was selected and was proven to be an activity-based probe of 3CLpro. This work highlights the great potential of combining deep learning models and biochemical experiments for intelligent iteration and for expanding the boundaries of drug discovery. The code and data are available at https://github.com/SIAT-code/AIMEE.
The de novo drug design plays an important role in the drug discovery. Recently deep learning based method has been popular as a promising approach for the design of novel drugs with desirable properties. However, conventional target-specific generative models mainly concentrate on the known inhibitors and thus produce similar molecules. And these derivatives of known inhibitors are probably negative against the same target. Considering the cost of chemical synthesis and experimental validation, the low false positive rate of generative molecules is very important. In this paper, we propose an efficient pipeline to generate novel SARS-CoV-2 3C-like protease inhibitors. Based on the GPT2 generator and the well performing multi-task predictor which achieves high precision on the highly imbalanced 3CL in vitro screening dataset (650 positive of 297,467 molecules), we acquired a number of novel 3CL-target compounds and analyzed their molecular properties. Moreover, we applied randomized SMILES for data augmentation of positive molecules to create larger chemical space for the generator. Finally, the selected positive compounds with desirable properties are exhibited, as well as their nearest neighbors of 3CL inhibitors which have already been verified in vitro.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.