Jyotsna Bahl scite author profile

Selection of the most significant basis functions to perform spline regressions is an extremely challenging problem in quantitative structure-activity relationship studies. Normally, spline-based regression models are derived either incrementally or using genetic algorithms, and they may not provide optimal solutions. To address this issue in a systematic way, we described herein a novel variable selection method, namely, random replacement method (RRM) combining the principles of replacement methods (RMs) and genetic algorithms. We applied RRM for the selection of variables in multiple linear regression on two model data sets and showed that the method outperforms other approaches, namely, variable selection and model building using prediction and ant colony optimization. We extended the application of RRM for the selection of basis functions in spline regression, and this approach is named as random function approximation (RFA). We compared the performance of RFA with that of multivariate adaptive regression splines and genetic function approximation and demonstrated the improved performances of the proposed method and the quality of the generated models based on coefficient of determination, R 2 and Q 2 loo

show abstract

Prediction of skin sensitization potential using D-optimal design and GA-kNN classification methods

Gunturi

Theerthala

Patel

et al. 2010

SAR and QSAR in Environmental Research

View full text Add to dashboard Cite

Modelling of skin sensitization data of 255 diverse compounds and 450 calculated descriptors was performed to develop global predictive classification models that are applicable to whole chemical space. With this aim, we employed two automated procedures, (a) D-optimal design to select optimal members of the training and test sets and (b) k-Nearest Neighbour classification (kNN) method along with Genetic Algorithms (GA-kNN Classification) to select significant and independent descriptors in order to build the models. This methodology helped us to derive multiple models, M1-M5, that are stable and robust. The best among them, model M1 (CCR(train) = 84.3%, CCR(test) = 87.2% and CCR(ext) = 80.4%), is based on six neighbours and nine descriptors and further suggests that: (a) it is stable and robust and performs better than the reported models in literature, and (b) the combination of D-optimal design and GA-kNN classification approach is a very promising approach. Consensus prediction based on the models M1-M5 improved the CCR of training, test and external validation datasets by 3.8%, 4.45% and 3.85%, respectively, over M1. From the analysis of the physical meaning of the selected descriptors, it is inferred that the skin sensitization potential of small organic compounds can be accurately predicted using calculated descriptors that code for the following fundamental properties: (i) lipophilicity, (ii) atomic polarizability, (iii) shape, (iii) electrostatic interactions, and (iv) chemical reactivity.

show abstract

Discovering the Knowledge in Unstructured Early Drug Development Data Using NLP and Advanced Analytics

Koneti

Das

Bahl

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jyotsna Bahl

Novel algorithm to select basis functions in spline regression: applications in quantitative structure–activity relationship studies

Prediction of skin sensitization potential using D-optimal design and GA-kNN classification methods

Discovering the Knowledge in Unstructured Early Drug Development Data Using NLP and Advanced Analytics

Contact Info

Product

Resources

About