Purpose
– The purpose of this paper is to predict the retention times of 84 pesticides or toxicants.
Design/methodology/approach
– Quantitative structure – retention relationship analysis was performed on a set of 84 pesticides or toxicants using a hybrid approach genetic algorithm/multiple linear regression (GA/MLR).
Findings
– A model with six descriptors was developed using as independent variables. Theoretical descriptors derived from Spartan and Dragon softwares when applying GA/MLR approach.
Originality/value
– A six parameter linear model developed by GA/MLR, with R² of 90.54, Q² of 88.15 and S of 0.0381 in Log value. Several validation techniques, including leave-many-out cross-validation, randomization test, and validation through the test set, illustrated the reliability of the proposed model. All of the descriptors involved can be directly calculated from the molecular structure of the compounds, thus the proposed model is predictive and could be used to estimate the retention times of pesticides or toxicants.
The study treated two closer alternative methods of which the principal characteristic: a non-parametric method (the least absolute deviation (LAD)) and a traditional method of diagnosis OLS.This was applied to model, separately, the indices of retention of the same whole of 35 pyrazines (27 pyrazines with 8 other pyrazines in the same unit) eluted to the columns OV-101 and Carbowax-20M, by using theoretical molecular descriptors calculated using the software DRAGON. The detection of influential observations for non-parametric method (LAD) is a problem which has been extensively studied and offers alternative dicapproaches whose main feature is the robustness .here is presented and compared with the standard least squares regression .The comparison between methods LAD and OLS is based on the equation of the hyperplane, in order to confirm the robustness thus to detect by the meaningless statements and the points of lever and validated results in the state approached by the tests statistics: Test of Anderson-Darling, shapiro-wilk, Agostino, Jarque-Bera, graphic test (histogram of frequency) and the confidence interval thanks to the concept of robustness to check if the distribution of the errors is really approximate.
Purpose
– The purpose of this paper is to predict the aquatic toxicity (LC50) of 92 substituted benzenes derivatives in Pimephales promelas.
Design/methodology/approach
– Quantitative structure-activity relationship analysis was performed on a series of 92 substituted benzenes derivatives using multiple linear regression (MLR), artificial neural network (ANN) and support vector machines (SVM) methods, which correlate aquatic toxicity (LC50) values of these chemicals to their structural descriptors. At first, the entire data set was split according to Kennard and Stone algorithm into a training set (74 chemicals) and a test set (18 chemical) for statistical external validation.
Findings
– Models with six descriptors were developed using as independent variables theoretical descriptors derived from Dragon software when applying genetic algorithm – variable subset selection procedure.
Originality/value
– The values of Q2 and RMSE in internal validation for MLR, SVM, and ANN model were: (0.8829; 0.225), (0.8882; 0.222); (0.8980; 0.214), respectively and also for external validation were: (0.9538; 0.141); (0.947; 0.146); (0.9564; 0.146). The statistical parameters obtained for the three approaches are very similar, which confirm that our six parameters model is stable, robust and significant.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.