Background: Quantitative and qualitative structure–activity relationships (QSARs) have been used to understand chemical behavior for almost a century. The main source of QSAR models is the scientific literature, but the open question is how well these models are documented. Objectives: The main aim of this study was to critically analyze the publication practices of QSARs with regard to transparency, potential reproducibility, and independent verification. The focus was on the level of technical completeness of the published QSARs. Methods: A total of 1,533 QSAR articles reporting 79 individual endpoints, mostly in environmental and health science, were reviewed. The QSAR parameters required for technical completeness were grouped into five categories: chemical structures, experimental endpoint values, descriptor values, mathematical representation of the model, and predicted endpoint values. The data were summarized and discussed using Circos plots. Results: Altogether, 42.5% of the reviewed articles were found to be potentially reproducible. The potential reproducibility for different endpoint groups varied; the respective rates were 39% for physical and chemical properties, 52% for ecotoxicity, 56% for environmental fate, 30% for human health, and 32% for toxicokinetics. The reproducibility of QSARs is discussed and placed in the context of the reproducibility of the experimental methods. Included are 65 references to open QSAR datasets as examples of models restored from scientific articles. Discussion: Strikingly poor documentation of QSARs was observed, which reduces the transparency, availability, and consequently, the application of research results in scientific, industrial, and regulatory areas. A list of the components needed to ensure the best practices for QSAR reporting is provided, allowing long-term use and preservation of the models. This list also allows an assessment of the reproducibility of models by interested parties such as journal editors, reviewers, regulators, evaluators, and potential users. https://doi.org/10.1289/EHP3264
The experimental EC(50) toxicities toward Daphnia magna for a series of 130 benzoic acids, benzaldehydes, phenylsulfonyl acetates, cycloalkane-carboxylates, benzanilides, and other esters were studied using the Best multilinear regression algorithm (BMLR) implemented in CODESSA. A modified quantitative structure-activity relationships (QSAR) procedure was applied guaranteeing the stability and reproducibility of the results. Separating the initial data set into training and test subsets generated three independent models with an average R(2) of .735. A five-descriptor general model including all 130 compounds, constructed using the descriptors found effective for the independent subsets, was characterized by the following statistical parameters: R(2) = .712; R(2)(cv) = .676; F = 61.331; s(2) = 0.6. The removal of two extreme outliers improved significantly the statistical parameters: R(2) = .759; R(2)(cv) = .728; F = 77.032; s(2) = 0.499. The sensitivity of the general model to chance correlations was estimated by applying a scrambling procedure involving 20 randomizations of the original property values. The resulting R(2) = .192 demonstrated the high robustness of the model proposed. The descriptors appearing in the obtained models are related to the biochemical nature of the adverse effects. An additional study of the EC(50)/LC(50) relationship for a series of 28 compounds (part of our general data set) revealed that these endpoints correlated with R(2) = .98.
In this study, general and class-specific QSPR models for soil sorption, logK(OC), of 344 organic pollutants (0 < logK(OC) < 4.94) were developed using a large variety of theoretical molecular descriptors based only on molecular structure. Two general models were obtained. The first model was derived for a structurally representative set of 68 chemicals (R2=0.76, s=0.44), whereas the second involved a total of 344 compounds (R2=0.76, s=0.41). The first was validated using the data for the remaining 276 pollutants (R2=0.70, s=0.45). An additional validation of both models was performed using an independent set of 48 pollutants. Both models predict the logK(OC) at the level of experimental precision, while the theoretical molecular descriptors appearing in the QSPR models give further insight into the mechanisms of soil sorption. The analysis of the distribution of the residuals of the logK(OC) values calculated by both general models indicated the need and possible advantages of modeling soil sorption for smaller data sets related to individual classes of chemicals. Accordingly, QSPR models were also developed for 14 chemical classes. The descriptors appearing in these models were discussed as related to the possible interaction mechanisms in soil sorption.
It has been suggested that the computational cost of correlated ab initio calculations could be reduced efficiently by using truncated basis sets on hydrogen atoms (Mintz et al., J Chem Phys 2004, 121, 5629). We now explore this proposal in the context of conformational analysis of small molecules, such as hydrogen peroxide, dimethyl ether, ethyl methyl ether, formic acid, methyl formate, and several small alcohols. It is found that truncated correlation consistent basis sets that lack certain higher angular momentum functions on hydrogen atoms offer accuracy similar to traditional Dunning's basis sets for conformational analysis. Combination of such basis sets with the basis set extrapolation technique to estimate Hartree-Fock and Møller-Plesset second order energies provides composite extrapolation model chemistries that are significantly more accurate and faster than analogous single point calculations with traditional correlation consistent basis sets. Root mean square errors of best composite extrapolation model chemistries on the used set of molecules are within 0.03 kcal/mol of traditional focal point conformational energies. The applicability of composite extrapolation methods is illustrated by performing conformational analysis of tert-butanol and cyclohexanol. For comparison, conformational energies calculated with popular molecular mechanics force fields are also given.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.