1973
DOI: 10.1021/c160050a013
|View full text |Cite
|
Sign up to set email alerts
|

Strategic Considerations in the Design of a Screening System for Substructure Searches of Chemical Structure Files

Abstract: A major problem in the design of screening systems for substructure searches of chemical structure files is the development of a methodology for selection of an optimal set of structural characteristics to act as screens. The set chosen for a particular application will depend on the characteristics of the collection, as well as on its size and growth rate. A strategy which takes account of the disparate frequencies of the various species of fragments in a data-base by use of differential, and, in part, hierar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
45
0

Year Published

1974
1974
2008
2008

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 70 publications
(45 citation statements)
references
References 0 publications
0
45
0
Order By: Relevance
“…1). It is similar to augmented atoms 12 or the extended connectivity circular fingerprints (ECFPs) employed by Scitegic (San Diego, CA). 13 Feature selection is performed using information-gain-based feature selection, 13 which was originally devised to induce rules as nodes of decision trees.…”
mentioning
confidence: 99%
“…1). It is similar to augmented atoms 12 or the extended connectivity circular fingerprints (ECFPs) employed by Scitegic (San Diego, CA). 13 Feature selection is performed using information-gain-based feature selection, 13 which was originally devised to induce rules as nodes of decision trees.…”
mentioning
confidence: 99%
“…This important ligand-based drug discovery methodology and classification approach are associated with the following two fundamental computational problems. (1) The notion of similarity used in search determines the molecules that are extracted from the database. A notion of similarity which has the highest level of bioactivity discrimination is very desirable and needs to be determined computationally.…”
Section: Introductionmentioning
confidence: 99%
“…Towards this goal, a number of different approaches have been developed that represent each compound by a set of descriptors that are based on frequency, physiochemical properties as well as topological and geometric substructures (fragments) [1,3,6,8,13,[28][29][30]36]. Historically, the best performing and most widely used descriptors have been based on fingerprints, which represent each molecular graph by a fixed length bit-vector derived by enumerating all bounded length cycles and paths in the graph (e.g., Daylight [29]), and on sets of fragments that have been identified a priori by domain experts (e.g., Maccs keys [30]).…”
Section: Introductionmentioning
confidence: 99%