Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data.
In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.Keywords: blood-brain barrier · molecular descriptors · variable selection · machine learning · random forest · two-deep cross validation · quantitative structure-activity relationship (QSAR)[a] S. Majumdar
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.