A major problem in the design of screening systems for substructure searches of chemical structure files is the development of a methodology for selection of an optimal set of structural characteristics to act as screens. The set chosen for a particular application will depend on the characteristics of the collection, as well as on its size and growth rate. A strategy which takes account of the disparate frequencies of the various species of fragments in a data-base by use of differential, and, in part, hierarchical levels of description is detailed. The distributions of a variety of structural characteristics, including bond-centered, atom-centered, and ring fragments in a 30,000-compound sample of the Chemical Abstracts Service Registry System are summarized. Implementation of the approach, using primarily bond-centered fragments, by means of simple and highly efficient computer programs, is detailed.The need to provide flexible and economic searches of chemical structure files to fulfil chemists' requirements for substructure searching within more general chemical information systems poses complex problems with interesting implications both practical and theoretical in nature. Many approaches have been advocated,l embodying a variety of viewpoints. In no respect has opinion been more varied than in the design of screening systems. These entail the selection of structural characteristics on the basis of which an approximate match between queries and potential answers is made. This stage may be followed by a more detailed search involving atom-bondatom path tracing. The adequacy of the selection of characteristics on the basis of which the collection is indexed is critical both to the extent to which the system can fulfil the variety of queries addressed to it and to the over-all costs of searching.The work reported in this paper arose from the conviction that it was essential to develop a general methodology for the design of screening systems, which could then be applied with equal validity to collections differing widely both in size and composition. (The need for such a methodology is borne out by even a cursory examination of the diversity of conventional fragmentation codes,* which generally reflect both of these factors. Thus a system devised for an alkaloid file will place heavy emphasis on ring-system skeletons and on the environments of nitrogen atoms, whereas a code devised for a large collection will, of necessity, be more specific and contain a greater number of characteristics than that for a small file.) In terms of size, therefore, the assumption was made that a greater level of selectivity is required in searches of larger files than in smaller ones; if a constant proportion of structures were retrieved, searches of large files might result in impractical numbers of structures being retrieved. In terms of composition, it was assumed that the queries addressed to a collection would roughly mirror the characteristics of the file; this is again borne out by experience with fragmentation codes,3 and...
The frequencies of monocycles and of primary rings in 1 : 1and 1 : 2-fused polycycles have been analysed by means of a simple and rapid computer procedure. The analysis deals with the great majority of ring systems in a sample of the Chemical Abstracts Registry System. The data are presented in terms of ring sizes and Composition. In each case, the preponderance of six-membered carbocyclic rings is evident.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.