The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has prompted
researchers to pivot their efforts to finding antiviral compounds and vaccines. In this
study, we focused on the human host cell transmembrane protease serine 2 (TMPRSS2),
which plays an important role in the viral life cycle by cleaving the spike protein to
initiate membrane fusion. TMPRSS2 is an attractive target and has received attention for
the development of drugs against SARS and Middle East respiratory syndrome. Starting
with comparative structural modeling and a binding model analysis, we developed an
efficient pharmacophore-based approach and applied a large-scale in silico database
screening for small-molecule inhibitors against TMPRSS2. The hits were evaluated in the
TMPRSS2 biochemical assay and the SARS-CoV-2 pseudotyped particle entry assay. A number
of novel inhibitors were identified, providing starting points for the further
development of drug candidates for the treatment of coronavirus disease 2019.
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple singleand multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.