Modeling of human intestinal absorption (HIA) data of 175 diverse drugs and 336 calculated descriptors is performed to develop global predictive models that are applicable to the whole medicinal chemistry space. With this aim, we employed two automated procedures, (a) Sphere Exclusion Algorithm (SEA) to select members of the training and test sets based on structural dissimilarity and (b) k-Nearest Neighbors (kNN) method along with Genetic Algorithms (kNN-QSAR-GA) to select significant and independent descriptors. This methodology helped us to derive optimal Quantitative Structure -Property Relationship (QSPR) models based on three and four descriptors. The best three descriptor model is based on Delta Chi Index of order 3 (Cluster), Hydrogen type E-State index ShsOH, AlogP99 (q ext statistics with reported models using other approaches, it is shown that: (a) the models have high stability and are robust and (b) for the first time in HIA modeling, the combination of an automated training set selection (SEA) followed by variable selection (kNN-QSAR_GA) is shown to be a promising methodology to build multiple stable models that are useful in consensus prediction. From the analysis of the physical meaning of the selected descriptors, it is inferred that the HIA of small organic compounds can be accurately predicted using calculated descriptors that code for the following fundamental properties: (1) lipophilicity, (2) hydrogen bonding capacity, (3) size, and (4) shape and further, the role of new calculated descriptors on the HIA profile of small organic compounds is uncovered. Finally, as the models reported herein are based on computed properties, they appear to be a valuable tool in virtual screening, where selection and prioritization of candidates is required.
We have collated hERG inhibition data of 165 compounds from literature and employed two regression procedures, namely, Local Lazy Regression (LLR) and k-Nearest Neighbor (kNN)-QSAR regression methods in combination with Genetic Algorithms (GAs) to select significant and independent molecular descriptors and to build robust predictive models. This methodology helped us to derive four, optimal 2D-and 3D-QSPR models, M1 -M4, based on five descriptors. Extensive validation tests using leave-one-out method and 61 compounds that are not used in the model generation strongly suggest that: (i) models M1 and M2, based on LLR, are very stable and robust; (ii) the model, M2 based on 3-D descriptors, performs better than the one based on 2-D descriptors, M1; and (iii) LLR method outperforms kNN regression approach. These results strongly suggest that the combination of GA and LLR method is a promising methodology, to build multiple stable models that are useful in consensus prediction. Further, from the analysis of the physical meaning of the descriptors, used in the best 2-D and 3-D descriptor models, M1 and M2, the significant physico-chemical forces that determine the hERG inhibition profile of small organic compounds are uncovered. Finally, as the models reported herein, are based on computed properties, they appear a valuable tool in virtual screening, where selection and prioritization of candidates is required.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.