Small, colloidally aggregating molecules (SCAMs) are the most common source of false positives in high-throughput screening (HTS) campaigns. Although SCAMs can be experimentally detected and suppressed by the addition of detergent in the assay buffer, detergent sensitivity is not routinely monitored in HTS. Computational methods are thus needed to flag potential SCAMs during HTS triage. In this study, we have developed and rigorously validated quantitative structureinterference relationship (QSIR) models of detergent-sensitive aggregation in several HTS campaigns under various assay conditions and screening concentrations. In particular, we have modeled detergentsensitive aggregation in an AmpC β-lactamase assay, the preferred HTS counter-screen for aggregation, as well as in another assay that measures cruzain inhibition. Our models increase the accuracy of aggregation prediction by ∼53% in the β-lactamase assay and by ∼46% in the cruzain assay compared to previously published methods. We also discuss the importance of both assay conditions and screening concentrations in the development of QSIR models for various interference mechanisms besides aggregation. The models developed in this study are publicly available for fast prediction within the SCAM detective web application (https://scamdetective.mml.unc.edu/).
Multiple approaches to quantitative structure-activity relationship (QSAR) modeling using various statistical or machine learning techniques and different types of chemical descriptors have been developed over the years. Oftentimes models are used in consensus to make more accurate predictions at the expense of model interpretation. We propose a simple, fast, and reliable method termed Multi-Descriptor Read Across (MuDRA) for developing both accurate and interpretable models. The method is conceptually related to the well-known kNN approach but uses different types of chemical descriptors simultaneously for similarity assessment. To benchmark the new method, we have built MuDRA models for six different end points (Ames mutagenicity, aquatic toxicity, hepatotoxicity, hERG liability, skin sensitization, and endocrine disruption) and compared the results with those generated with conventional consensus QSAR modeling. We find that models built with MuDRA show consistently high external accuracy similar to that of conventional QSAR models. However, MuDRA models excel in terms of transparency, interpretability, and computational efficiency. We posit that due to its methodological simplicity and reliable predictive accuracy, MuDRA provides a powerful alternative to a much more complex consensus QSAR modeling. MuDRA is implemented and freely available at the Chembench web portal ( https://chembench.mml.unc.edu/mudra ).
The main protease (M pro) of the SARS-CoV-2 has been proposed as one of the major drug targets for COVID-19. We have identified the experimental data on the inhibitory activity of compounds tested against the closely related (96 % sequence identity, 100 % active site conservation) M pro of SARS-CoV. We developed QSAR models of these inhibitors and employed these models for virtual screening of all drugs in the DrugBank database. Similarity searching and molecular docking were explored in parallel, but docking failed to correctly discriminate between experimentally active and inactive compounds, so it was not relied upon for prospective virtual screening. Forty-two compounds were identified by our models as consensus computational hits. Subsequent to our computational studies, NCATS reported the results of experimental screening of their drug collection in SARS-CoV-2 cytopathic effect assay (https://opendata.ncats.nih.gov/covid19/). Coincidentally, NCATS tested 11 of our 42 hits, and three of them, cenicriviroc (AC 50 of 8.9 μM), proglumetacin (tested twice independently, with AC 50 of 8.9 μM and 12.5 μM), and sufugolix (AC 50 12.6 μM), were shown to be active. These observations support the value of our modeling approaches and models for guiding the experimental investigations of putative anti-COVID-19 drug candidates. All data and models used in this study are publicly available via Supplementary Materials, GitHub (https://github.com/alvesvm/sars-cov-mpro), and Chembench web portal (https:// chembench.mml.unc.edu/).
Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre-including this research content-immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.