The performances of the three novel QSAR algorithms, principal component-artificial neural network modeling method combining with three factor selection procedures named eigenvalue ranking, correlation ranking, and genetic algorithm (ER-PC-ANN, CR-PC-ANN, PC-GA-ANN, respectively), are compared by application of these model to the prediction of the carcinogenic activity of a large set of drugs (735 drugs) belonging to a diverse type of compounds. A total number of 1350 theoretical descriptors are calculated for each molecule. The matrix of calculated descriptors (with 735 x 1350 dimension) is subjected to PCA. 95% of the variances in the matrix are explained by the first 137 principal components (PC's). From the pool of 137 PC's, the factor selection methods (ER, CR, and GA) are employed to select the best set of PC's for PC-ANN modeling. In the ER-PC-ANN, the PC's are successively entered into the ANN based on their decreasing eigenvalue. In the CR-PC-ANN, the ANN is first employed to model the nonlinear relationship between each one of the PC's and the carcinogen activity separately. Then, the PC's are ranked based on their decreasing correlating ability and entered to the input layer of the network one after another. Finally, a search algorithm (i.e. genetic algorithm) is used to find the best set of PC's. Both the external and cross-validation methods are used to validate the performances of the resulting models. One is able to see that the results obtained by the PC-GA-ANN and CR-PC-ANN procedures are superior to those resulted from the EV-PC-ANN. Comparison of the results reveals that the results produced by the PC-GA-ANN algorithm are better than those produced by CR-PC-ANN. However, the difference is not significant.
The usefulness of the quantum chemical descriptors, calculated at the level of the RHF theory using 6-31G basis set for QSAR study of 1,4-dihydropyridine-based calcium channel antagonist was examined. A data set containing 45 dihydropyridine derivatives with known activity was used. Multiple linear regressions combined with genetic algorithm for variable selection and an artificial neural network model combined with principal component analysis for dimension reduction and genetic algorithm for factor selection (PC-GA-ANN) were employed. Some multiparametric MLR equations with good statistical quality were obtained for different classes of dihydropyridine derivatives. The resulting equations suggest that the electronic properties of the atoms belonging to the backbone of the molecules as well as the conformation of the molecules affect the binding of these molecules with their receptor. In the PC-GA-ANN, The principal components of the descriptors data matrix were used as the input of the neural network and then genetic algorithm was applied to select the most relevant set of principal components. Two ANN models with five selected principal components were obtained. These models, which have high statistical qualities, can predict the activity of the molecules with prediction errors lower than +/-5%.
A genetic algorithm-based artificial neural network model has been developed for the accurate prediction of the blood-brain barrier partitioning (in logBB scale) of chemicals. A data set of 123 logBB (115 old molecules and 8 new molecules) of a diverse set of chemicals was chosen in this study. The optimum 3D geometry of the molecules was estimated by the ab initio calculations at the level of RHF/STO-3G, and consequently, different electronic descriptors were calculated for each molecule. Indeed, logP as a measure of hydrophobicity and different topological indices were also calculated. A three-layered artificial neural network with backpropagation of an error-learning algorithm was employed to process the nonlinear relationship between the calculated descriptors and logBB data. Genetic algorithm was used as a feature selection method to select the most relevant set of descriptors as the input of the network. Modeling of the logBB data by the only quantum descriptors produced a 5:4:1 ANN structure with RMS error of validation and crossvalidation equal to 0.224 and 0.227, respectively. Better nonlinear model (RMS(V) and RMS(CV) equals to 0.097 and 0.099, respectively) was obtained by the incorporation of the logP and the principal components of the topological indices to electronic descriptors. The ultimate performances of the models were obtained by the application of the models to predict the logBB of 23 molecules that did not have contribution in the steps of model development. The best model produced RMS error of prediction 0.140, and could predict about 98% of variances in the logBB data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.