It is important that in silico models for use in chemical safety legislation, such as REACH, are compliant with the OECD Principles for the Validation of (Q)SARs. Structural alert models can be useful under these circumstances but lack an adequately defined applicability domain. This paper examines several methods of domain definition for structural alert models with the aim of assessing which were the most useful. Specifically, these methods were the use of fragments, chemical descriptor ranges, structural similarity, and specific applicability domain definition software. Structural alerts for mutagenicity in Derek for Windows (DfW) were used as examples, and Ames test data were used to define and test the domain of chemical space where the alerts produce reliable results. The usefulness of each domain was assessed on the criterion that confidence in the correctness of predictions should be greater inside the domain than outside it. By using a combination of structural similarity and chemical fragments a domain was produced where the majority of correct positive predictions for mutagenicity were within the domain and a large proportion of the incorrect positive predictions outside it. However this was not found for the negative predictions; there was little difference between the percentage of true and false predictions for inactivity which were found as either within or outside the applicability domain. A hypothesis for the occurrence of this difference between positive and negative predictions is that differences in structure between training and test compounds are more likely to remove the toxic potential of a compound containing a structural alert than to add an unknown mechanism of action (structural alert) to a molecule which does not already contain an alert. This could be especially true for well studied end points such as the Ames assay where the majority of mechanisms of action are likely to be known.
The design of new alerts, that is, collections of structural features observed to result in toxicological activity, can be a slow process and may require significant input from toxicology and chemistry experts. A method has therefore been developed to help automate alert identification by mining descriptions of activating structural features directly from toxicity data sets. The method is based on jumping emerging pattern mining which is applied to a set of toxic and nontoxic compounds that are represented using atom pair descriptors. Using the resulting jumping emerging patterns, it is possible to cluster toxic compounds into groups defined by the presence of shared structural features and to arrange the clusters into hierarchies. The methodology has been tested on a number of data sets for Ames mutagenicity, oestrogenicity, and hERG channel inhibition end points. These tests have shown the method to be effective at clustering the data sets around minimal jumping-emerging structural patterns and finding descriptions of potentially activating structural features. Furthermore, the mined structural features have been shown to be related to some of the known alerts for all three tested end points.
Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging pattern mining method for the automated identification of activating structural features in toxicity data sets that is designed to help expedite the process of alert development. We apply the contrast pattern tree mining algorithm to generate a set of emerging patterns of structural fragment descriptors. Using the emerging patterns it is possible to form hierarchical clusters of compounds that are defined by the presence of common structural features and represent distinct chemical classes. The method has been tested on a large public in vitro mutagenicity data set and a public hERG channel inhibition data set and is shown to be effective at identifying common toxic features and recognizable classes of toxicants. We also describe how knowledge developers can use emerging patterns to improve the specificity and sensitivity of an existing expert system.
The discovered patterns are used to develop new structural alerts for mutagenicity in the Derek Nexus expert system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.