1997
DOI: 10.1016/s1093-3263(98)00008-4
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of algorithms for dissimilarity-based compound selection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
154
0
2

Year Published

1998
1998
2011
2011

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 191 publications
(156 citation statements)
references
References 18 publications
0
154
0
2
Order By: Relevance
“…We have already noted that the identification of the n most diverse molecules in a dataset containing N molecules is generally infeasible for non-trivial values of n and N (but see Section 4 below for an exception to this general rule), and practicable approaches to dissimilarity-based compound selection hence involve approximate methods that are not guaranteed to result in the identification of the most dissimilar possible subset (see, e.g., Bawden, 1993;Clark, 1997;Hudson et al, 1996;Lajiness, 1990, Marengo andTodeschini, 1992;Nilakantan et al, 1997;Pickett et al, 1998;Polinsky et al, 1996); that said, there is evidence to suggest that the subsets identified are only marginally sub-5 optimal (Gillet et al, 1997). Thus far, two major classes of algorithm have been described: maximum-dissimilarity algorithms and sphere-exclusion algorithms (Snarey et al, 1998) The basic maximum-dissimilarity algorithm for selecting a size-nSubset from a size-NDataset is shown in Figure 1. This algorithm, which was first described by Kennard and Stone (1969) and which was applied to compound selection by Lajiness (1990) and Bawden (1993), permits many variants depending upon the precise implementation of Steps 1 and 3.…”
Section: Selection Of Compounds From a Databasementioning
confidence: 99%
See 2 more Smart Citations
“…We have already noted that the identification of the n most diverse molecules in a dataset containing N molecules is generally infeasible for non-trivial values of n and N (but see Section 4 below for an exception to this general rule), and practicable approaches to dissimilarity-based compound selection hence involve approximate methods that are not guaranteed to result in the identification of the most dissimilar possible subset (see, e.g., Bawden, 1993;Clark, 1997;Hudson et al, 1996;Lajiness, 1990, Marengo andTodeschini, 1992;Nilakantan et al, 1997;Pickett et al, 1998;Polinsky et al, 1996); that said, there is evidence to suggest that the subsets identified are only marginally sub-5 optimal (Gillet et al, 1997). Thus far, two major classes of algorithm have been described: maximum-dissimilarity algorithms and sphere-exclusion algorithms (Snarey et al, 1998) The basic maximum-dissimilarity algorithm for selecting a size-nSubset from a size-NDataset is shown in Figure 1. This algorithm, which was first described by Kennard and Stone (1969) and which was applied to compound selection by Lajiness (1990) and Bawden (1993), permits many variants depending upon the precise implementation of Steps 1 and 3.…”
Section: Selection Of Compounds From a Databasementioning
confidence: 99%
“…Holliday et al (1995) described a MaxSum selection algorithm with a time complexity of O(nN), using an equivalence that had been developed for the rapid implementation of hierarchic agglomerative document clustering using the group-average clustering method (Voorhees, 1986). However, an analysis of the MaxSum definition by Agrafiotis and Lobanov (1999) suggested that it could result in subsets containing groups of closely-related molecules, and this limitation was subsequently demonstrated by Snarey et al (1998) (Higgs et al, 1997;Polinsky et al, 1996) and the comparative evaluation of Snarey et al (1998) showed it to be more effective than MaxSum in identifying database subsets exhibiting a range of biological activities; accordingly, it is probably the method of choice for this class of selection algorithms.…”
Section: Insert Figure 1 About Herementioning
confidence: 99%
See 1 more Smart Citation
“…In subsequent stages, that non-excluded molecule is chosen for inclusion in the subset that has the largest dissimilarity to those molecules that have already been selected, and further molecules excluded if they are nearest neighbours of the one that has been chosen [85] (other approaches have also been described [86]). These approaches involve the identification of the most dissimilar molecule at each stage, and different results can be obtained depending on how 'most dissimilar' is defined: the MaxMin approach is widely used, and involves selecting that molecule for inclusion that has the maximum dissimilarity to its nearest neighbour in the current subset of selected molecules [87].…”
Section: Molecular Diversity Analysismentioning
confidence: 99%
“…In each case, ten representative reference structures from an activity class were chosen for searching: the choices were made using a MaxMin diversity selection procedure, to ensure that the reference structures covered the full range of structural types within each activity class [40]. The numbers of actives retrieved in these similarity searches then averaged over the ten reference structures, using cut-offs of the top-1% and the top-5% of the similarity rankings.…”
Section: Datasetsmentioning
confidence: 99%