This work proposes a new structure-activity relationship (SAR) approach to mine molecular fragments that act as structural alerts for biological activity. The entire process is designed to fit with human reasoning, not only to make the predictions more reliable but also to permit clear control by the user in order to meet customized requirements. This approach has been tested on the mutagenicity endpoint, showing marked prediction skills and, more interestingly, bringing to the surface much of the knowledge already collected in the literature as well as new evidence.
For six random splits, one-variable models of rat toxicity (minus decimal logarithm of the 50% lethal dose [pLD50], oral exposure) have been calculated with CORAL software (http://www.insilico.eu/coral/). The total number of considered compounds is 689. New additional global attributes of the simplified molecular input line entry system (SMILES) have been examined for improvement of the optimal SMILES-based descriptors. These global SMILES attributes are representing the presence of some chemical elements and different kinds of chemical bonds (double, triple, and stereochemical). The "classic" scheme of building up quantitative structure-property/activity relationships and the balance of correlations (BC) with the ideal slopes were compared. For all six random splits, best prediction takes place if the aforementioned BC along with the global SMILES attributes are included in the modeling process. The average statistical characteristics for the external test set are the following: n = 119 ± 6.4, R(2) = 0.7371 ± 0.013, and root mean square error = 0.360 ± 0.037.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.