Artificial neural networks (ANNs) have been successfully trained to model and predict the acidity constants (pK(a)) of 128 various phenols with diverse chemical structures using a quantitative structure-activity relationship. An ANN with 6-14-1 architecture was generated using six molecular descriptors that appear in the multi-parameter linear regression (MLR) model. The polarizability term (pi (I)), most positive charge of acidic hydrogen atom (q+), molecular weight (MW), most negative charge of the phenolic oxygen atom (q-), the hydrogen-bond accepting ability (epsilon(B)) and partial-charge weighted topological electronic (PCWTE) descriptors are inputs and its output is pK(a). It was found that a properly selected and trained neural network with 106 phenols could represent the dependence of the acidity constant on molecular descriptors fairly well. For evaluation of the predictive power of the ANN, an optimized network was used to predict the pK(a)s of 22 compounds in the prediction set, which were not used in the optimization procedure. A squared correlation coefficient (R2) and root mean square error (RMSE) of 0.8950 and 0.5621 for the prediction set by the MLR model should be compared with the values of 0.99996 and 0.0114 by the ANN model. These improvements are due to the fact that the pK(a) of phenols shows non-linear correlations with the molecular descriptors. [Figure: see text].
Genetic algorithm (multiparameter linear regression; GA-MLR) and genetic algorithm–artificial neural network (GA-ANN) global models have been used for prediction of the toxicity of phenols to Tetrahymena pyriformis. The data set was divided into 150 molecules for training, 50 molecules for validation, and 50 molecules for prediction sets. A large number of descriptors were calculated and the genetic algorithm was used to select variables that resulted in the best-fit to models. The six molecular descriptors selected were used as inputs for the models. The MLR model was validated using leave-one-out, leave-group-out cross-validation and external test set. A three-layered feed forward ANN with back-propagation of error was generated using six molecular descriptors appearing in the MLR model. Comparison of the results obtained using the ANN model with those from the MLR revealed the superiority of the ANN model over the MLR. The root mean square error of the training, validation, and prediction sets for the ANN model were calculated to be 0.224, 0.202, and 0.224 and correlation coefficients (r2) of 0.926, 0.943, and 0.925 were obtained. The improvements are because of non-linear correlations of the toxicity of the compounds with the descriptors selected. The prediction ability of the GA-ANN global model is much better than that of previously proposed models.Graphical Abstract
Principal component-genetic algorithm-multiparameter linear regression (PC-GA-MLR) and principal component-genetic algorithm-artificial neural network (PC-GA-ANN) models were applied for prediction of melting point for 323 drug-like compounds. A large number of theoretical descriptors were calculated for each compound. The first 234 principal components (PC's) were found to explain more than 99.9% of variances in the original data matrix. From the pool of these PC's, the genetic algorithm was employed for selection of the best set of extracted PC's for PC-MLR and PC-ANN models. The models were generated using fifteen PC's as variables. For evaluation of the predictive power of the models, melting points of 64 compounds in the prediction set were calculated. Root-mean square errors (RMSE) for PC-GA-MLR and PC-GA-ANN models are 48.18 and 12.77 ºC, respectively. Comparison of the results obtained by the models reveals superiority of the PC-GA-ANN relative to the PC-GA-MLR and the recently proposed models (RMSE = 40.7 ºC). The improvements are due to the fact that the melting point of the compounds demonstrates non-linear correlations with the principal components.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.