QSARINS (QSAR-INSUBRIA) is a new software for the development and validation of multiple linear regression (MLR) Quantitative Structure-Activity Relationship (QSAR) models by Ordinary Least Squares (OLS) method and Genetic Algorithm (GA) for variable selection. This program is mainly focused on the external validation of QSAR models. Various tools for explorative analysis of the datasets by Principal Component Analysis, pre-reduction of input molecular descriptors, splitting of datasets in training and prediction sets, detection of outliers and interpolated or extrapolated predictions, internal and external validation by different parameters, consensus modeling and various plots for visualizations are implemented. QSARINS is a user-friendly platform for QSAR modeling in agreement with the OECD Principles and for the analysis of the reliability of the obtained predicted data. The Insubria PBT Index model for the prediction of the cumulative behaviour of new chemicals as Persistent Bioaccumulative and Toxics (PBTs) is implemented. Additionally, QSARINS allows the user to validate single models, pre-developed using also different software
A case study of toxicity of (benzo)triazoles ((B)TAZs) to the algae Pseudokirchneriella subcapitata is used to discuss some problems and solutions in QSAR modeling, particularly in the environmental context. The relevance of data curation (not only of experimental data, but also of chemical structures and input formats for the calculation of molecular descriptors), the crucial points of QSAR model validation and the potential application for new chemicals (internal robustness, exclusion of chance correlation, external predictivity, applicability domain) are described, while developing MLR-OLS models based on molecular descriptors, calculated by various QSAR software tools (commercial DRAGON, free PaDEL-Descriptor and QSPR-THESAURUS). Additionally, the utility of consensus models is highlighted. This work summarizes a methodology for a rigorous statistical approach to obtain reliable QSAR predictions, also for a large number of (B)TAZs in the ECHA preregistration list of REACH (even if starting from limited experimental data availability), and has evidenced some ambiguities and discrepancies related to SMILES notations from different databases; furthermore it highlighted some general problems related to QSAR model generation and was useful in the implementation of the PaDEL-Descriptor software.
Polybrominated diphenyl ethers (PBDEs) are a group of brominated flame retardants (BFRs), which were widely used in a variety of consumer products. Because of evidences of toxicity effects on different organisms and humans, as well as the ubiquitary profile of these compounds, PBDEs are considered an emerging group of toxic and persistent organic pollutants. However, due to the small amount of experimental data available, still little is known about the properties of most of these chemicals. In this study several physicochemical properties, experimentally available for few PBDE congeners and hexabromobenzene (HBB), were investigated through a modelling approach based on quantitative structure-property relationships (QSPR). The OLS regression models, based on theoretical molecular descriptors, are calculated for Henry's law constant, melting point, subcooled liquid vapor pressure, water solubility, octanol-air partition coefficient, and octanol-water partition coefficient. These models can be useful to predict the big amount of missing data and to plan safer alternatives to dangerous BFRs. The innovative aspect of the proposed models, compared to those already published in the literature, is their development according to the OECD principles for regulatory acceptability of QSARs. This includes the validation for predictivity (both by internal and external statistical validation) and the inspection of the applicability domain
The crucial importance of the three central OECD principles for quantitative structure-activity relationship (QSAR) model validation is highlighted in a case study of tropospheric degradation of volatile organic compounds (VOCs) by OH, applied to two CADASTER chemical classes (PBDEs and (benzo-)triazoles). The application of any QSAR model to chemicals without experimental data largely depends on model reproducibility by the user. The reproducibility of an unambiguous algorithm (OECD Principle 2) is guaranteed by redeveloping MLR models based on both updated version of DRAGON software for molecular descriptors calculation and some freely available online descriptors. The Genetic Algorithm has confirmed its ability to always select the most informative descriptors independently on the input pool of variables. The ability of the GA-selected descriptors to model chemicals not used in model development is verified by three different splittings (random by response, K-ANN and K-means clustering), thus ensuring the external predictivity of the new models, independently of the training/prediction set composition (OECD Principle 5). The relevance of checking the structural applicability domain becomes very evident on comparing the predictions for CADASTER chemicals, using the new models proposed herein, with those obtained by EPI Suite.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.