2009
DOI: 10.1016/j.jmgm.2009.06.006
|View full text |Cite
|
Sign up to set email alerts
|

Clustering files of chemical structures using the Székely–Rizzo generalization of Ward's method

Abstract: Ward's method is extensively used for clustering chemical structures represented by 2D fingerprints. This paper compares Ward clusterings of 14 datasets (containing between 278 and 4332 molecules) with those obtained using the Székely-Rizzo clustering method, a generalization of Ward's method. The clusters resulting from these two methods were evaluated by the extent to which the various classifications were able to group active molecules together, using a novel criterion of clustering effectiveness. Analysis … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0
1

Year Published

2010
2010
2015
2015

Publication Types

Select...
8

Relationship

5
3

Authors

Journals

citations
Cited by 43 publications
(30 citation statements)
references
References 32 publications
0
29
0
1
Order By: Relevance
“…[34] The additional searches were carried out using: the same BIN but with the standard belief function of Chen et al (there denoted by STD); the same BIN but with the language modelling smoothing function of Chen et al (there denoted by SMO); the BIN and belief function described by Abdo and Salim; [35] a conventional Tanimoto coefficient using binary fingerprints; a Tanimoto coefficient using fingerprints containing counts of the frequencies with which each bit occurs; a Tanimoto coefficient using fingerprints containing the square roots of the frequencies with which each bit occurs, as suggested by Arif et al; [41] and the Soergel coefficient using fingerprints containing counts of the frequencies with which each bit occurs, as suggested by Varin et al [42] The relative performance of the systems varied somewhat from dataset to dataset, but with the Soergelbased system best overall. There was no such variation in the relative performance of the fusion rules, with rCombRKP giving the best results for all four datasets, as exemplified by the results listed in Table 7 for DS1.…”
Section: Resultsmentioning
confidence: 99%
“…[34] The additional searches were carried out using: the same BIN but with the standard belief function of Chen et al (there denoted by STD); the same BIN but with the language modelling smoothing function of Chen et al (there denoted by SMO); the BIN and belief function described by Abdo and Salim; [35] a conventional Tanimoto coefficient using binary fingerprints; a Tanimoto coefficient using fingerprints containing counts of the frequencies with which each bit occurs; a Tanimoto coefficient using fingerprints containing the square roots of the frequencies with which each bit occurs, as suggested by Arif et al; [41] and the Soergel coefficient using fingerprints containing counts of the frequencies with which each bit occurs, as suggested by Varin et al [42] The relative performance of the systems varied somewhat from dataset to dataset, but with the Soergelbased system best overall. There was no such variation in the relative performance of the fusion rules, with rCombRKP giving the best results for all four datasets, as exemplified by the results listed in Table 7 for DS1.…”
Section: Resultsmentioning
confidence: 99%
“…To do so, many criteria have been described like partitioning methods, hierarchical clustering, etc. One of the most widespread hierarchical clustering methods is the Ward's method [56][57][58][59][60][61][62][63][64]. According to Hands and Everitt [64], this method achieves good results than other hierarchical methods (single-link, complete linkage, median, average linkage, etc.)…”
Section: Q4mentioning
confidence: 99%
“…The high value occurs when the actives are clustered tightly together and separated from the inactive molecules. The QPI is defined as: [8] QPI…”
Section: Performance Evaluationmentioning
confidence: 99%