Mutagenicity is one of the numerous adverse properties of a compound that hampers its potential to become a marketable drug. Toxic properties can often be related to chemical structure, more specifically, to particular substructures, which are generally identified as toxicophores. A number of toxicophores have already been identified in the literature. This study aims at increasing the current degree of reliability and accuracy of mutagenicity predictions by identifying novel toxicophores from the application of new criteria for toxicophore rule derivation and validation to a considerably sized mutagenicity dataset. For this purpose, a dataset of 4337 molecular structures with corresponding Ames test data (2401 mutagens and 1936 nonmutagens) was constructed. An initial substructure-search of this dataset showed that most mutagens were detected by applying only eight general toxicophores. From these eight, more specific toxicophores were derived and approved by employing chemical and mechanistic knowledge in combination with statistical criteria. A final set of 29 toxicophores containing new substructures was assembled that could classify the mutagenicity of the investigated dataset with a total classification error of 18%. Furthermore, mutagenicity predictions of an independent validation set of 535 compounds were performed with an error percentage of 15%. Since these error percentages approach the average interlaboratory reproducibility error of Ames tests, which is 15%, it was concluded that these toxicophores can be applied to risk assessment processes and can guide the design of chemical libraries for hit and lead optimization.
Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail.
Communicated by Mauno VihinenThe superfamily of human G protein-coupled receptors (GPCRs) is large and regulates a plethora of important physiological processes by transducing extracellular signals over cell membranes. A diversity of natural variants occurs in these receptors, including rare mutations and common polymorphisms. These variants differ in their impact on DNA, ranging from single nucleotide polymorphisms (SNPs) to copy number variants, and in their impact on protein function. Natural variants furthermore vary in their effects on human phenotypes from neutral to disease-associated. As mutation data are highly dispersed over numerous sources, a single resource for variants would aid investigators of GPCRs. The GPCR NaVa database therefore integrates data on natural variants in human GPCRs from online databases, the scientific literature, and patents. Where available, variants contain information on their location in the DNA (and protein sequence), the involved nucleotides (and amino acids), the average frequency of each allele, reported disease associations, and references to public databases and the scientific literature. The GPCR NaVa database aims to facilitate studies into pharmacogenetics, genotype-phenotype, and structure-function relationships of GPCRs. The GPCR NaVa database is interlinked with the family-specific GPCRDB resource and is accessible as a stand-alone database through a user-friendly website at http://nava.
Mining subgraphs is an area of research where we have a given set of graphs, and we search for (connected) subgraphs contained in these graphs. In this paper we focus on the analysis of graph patterns where the graphs are molecules and the subgraphs are patterns. In the analysis of fragments one is interested in the molecules in which the patterns occur. This data can be very extensive and in this paper we introduce a technique of making it better available using visualization. The user does not have to browse all the occurrences in search of patterns occurring in the same molecules; instead the user can directly see which subgraphs are of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.