The identification of molecular descriptors that embody the chemical information for druglikeness will be a step forward in data-driven drug discovery and development endeavor. In this study, over 4000 Dragon-type molecular properties were generated for approximately 2000 known drugs and 2000 surrogate nondrugs. Logistic Regression (LogR) and Random Forest (RF) techniques were carried out to unveil the crucial molecular descriptors that can adequately classify a compound as drug or nondrug. Ten one-variable LogR models each demonstrated at least 70% prediction accuracy. A two-variable model consisting of HVcpx and MDDD correctly classified 85% of the test compounds. The best LogR model with 89.0% prediction accuracy identified five most influential descriptors for druglikeness: an information index HVcpx, topological index MDDD, a ring descriptor NNRS, X2A or average connectivity index of order 2, and walk and path count SRW05. The best RF model involving 10 only weakly correlated descriptors was found to be 92.5% accurate and at par with the RF and LogR models that consisted of over 200 variables. The model featured: molecular weight, MW; average molecular weight, AMW; rotatable bond fraction, RBF; percentage carbon, C%; maximal electrotopological negative variation, MAXDN; all-path Wiener index, Wap; structural information content index, neighborhood symmetry of 1 order, SIC1; number of nitrogen atoms, nN; 2D Petitjean shape index, PJI2; and self-returning walk count of order 5, SRW05. Many of these descriptors have straightforward chemical interpretability and future applicability as druglikeness filters in virtual high throughput drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.