Although the concept of similarity is a convenient for humans, a formal definition of similarity between chemical compounds is needed to enable automatic decision‐making. The objective of similarity measures in toxicology and drug design is to allow assessment of chemical activities. The ideal similarity measure should be relevant to the activity of interest. The relevance could be established by exploiting the knowledge about fundamental chemical and biological processes responsible for the activity. Unfortunately, this knowledge is rarely available and therefore different approximations have been developed based on similarity between structures or descriptor values. Various methods are reviewed, ranging from two‐dimensional, three‐dimensional and field approaches to recent methods based on “Atoms in Molecules” theory. All these methods attempt to describe chemical compounds by a set of numerical values and define some means for comparison between them. The review provides analysis of potential pitfalls of this methodology – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity. A brief review of known methods for descriptor selection is also provided. The popular “neighborhood behavior” principle is criticized, since proximity with respect to descriptors does not necessarily mean proximity with respect to activity. Structural similarity should also be used with care, as it does not always imply similar activity, as shown by examples. We remind that similarity measures and classification techniques based on distances rely on certain data distribution assumptions. If these assumptions are not satisfied for a given dataset, the results could be misleading. A discussion on similarity in descriptor space in the context of applicability domain assessment of QSAR models is also provided. Finally, it is shown that descriptor based similarity analysis is prone to errors if the relationship between the activity and the descriptors has not been previously established. A justification for the usage of a particular similarity measure should be provided for every specific activity by expert knowledge or derived by data modeling techniques.
A novel mechanistic modeling approach has been developed that assesses chemical biodegradability in a quantitative manner. It is an expert system predicting biotransformation pathway working together with a probabilistic model that calculates probabilities of the individual transformations. The expert system contains a library of hierarchically ordered individual transformations and matching substructure engine. The hierarchy in the expert system was set according to the descending order of the individual transformation probabilities. The integrated principal catabolic steps are derived from set of metabolic pathways predicted for each chemical from the training set and encompass more than one real biodegradation step to improve the speed of predictions. In the current work, we modeled O2 yield during OECD 302 C (MITI I) test. MITI-I database of 532 chemicals was used as a training set. To make biodegradability predictions, the model only needs structure of a chemical. The output is given as percentage of theoretical biological oxygen demand (BOD). The model allows for identifying potentially persistent catabolic intermediates and their molar amounts. The data in the training set agreed well with the calculated BODs (r2 = 0.90) in the entire range i.e. a good fit was observed for readily, intermediate and difficult to degrade chemicals. After introducing 60% ThOD as a cut off value the model predicted correctly 98% ready biodegradable structures and 96% not ready biodegradable structures. Crossvalidation by four times leaving 25% of data resulted in Q2 = 0.88 between observed and predicted values. Presented approach and obtained results were used to develop computer software for biodegradability prediction CATABOL.
Recently we described the Common REactivity PAttern (COREPA) technique to screen data sets of diverse structures for their ability to serve as ligands for steroid hormone receptors [1]. The approach identi®es and quanti®es similar global and local stereoelectronic characteristics associated with active ligands through a comparison of energeticallyreasonable conformer distributions for selected descriptors. For each stereoelectronic descriptor selected, discrete conformer distributions from a training set of ligands are evaluated and parameter ranges common for conformers from all the chemicals in the training set are identi®ed. The use of discrete partitions of parameter ranges to de®ne common reactivity patterns can, however, in¯uence the outcome of the algorithm. To address this limitation, the original method has been extended by approximating continuous conformer distributions as probability distributions. The COREPA-Continuous (COREPA-C) algorithm assesses the common reactivity pattern of biologicallysimilar molecules in terms of a product of probability distributions, rather than a collection of common population ranges determined by examination of discrete partitions of a distribution. To illustrate the algorithm, common reactivity patterns based on interatomic distance and charge on heteroatoms were developed and evaluated using a set of 28 androgen receptor ligands. Notable attributes of the COREPA-C algorithm include¯exibility in establishing stereoelectronic descriptor criteria for identifying active and nonactive compounds and the ability to quantify threedimensional chemical similarity without the need to predetermine a toxicophore or align compounds(s) to a lead ligand.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.