2009
DOI: 10.1021/ci800249s
|View full text |Cite
|
Sign up to set email alerts
|

How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space

Abstract: Different molecular descriptors capture different aspects of molecular structures, but this effect has not yet been quantified systematically on a large scale. In this work, we calculate the similarity of 37 descriptors by repeatedly selecting query compounds and ranking the rest of the database. Euclidean distances between the rank-ordering of different descriptors are calculated to determine descriptor (as opposed to compound) similarity, followed by PCA for visualization. Four broad descriptor classes are i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

12
244
0
1

Year Published

2009
2009
2015
2015

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 278 publications
(257 citation statements)
references
References 46 publications
12
244
0
1
Order By: Relevance
“…It is noteworthy that it is superior to both TAN, COS and TAN-B, all of which have been used in previous studies of similarity searching. We note also that TAN-B is superior to TAN, despite the fact that several previous studies have suggested that weighted-searching is superior to binary searching [10; 36; 37]; however, our results here are in accord with recent work by Bender et al [38].…”
Section: Additional Comparison Of Distance Coefficientssupporting
confidence: 89%
“…It is noteworthy that it is superior to both TAN, COS and TAN-B, all of which have been used in previous studies of similarity searching. We note also that TAN-B is superior to TAN, despite the fact that several previous studies have suggested that weighted-searching is superior to binary searching [10; 36; 37]; however, our results here are in accord with recent work by Bender et al [38].…”
Section: Additional Comparison Of Distance Coefficientssupporting
confidence: 89%
“…Significant progress has been made quantifying and visualizing properties of compound sets (26), including methods that relate structure to intuitive notions of shape (27)(28)(29), and similarity fusion methods (30)(31)(32)(33) that describe relationships between sets. Moreover, chemical similarity and diversity analyses continue to progress (34)(35)(36)(37), including studies using Shannon entropy (38) as a measure of structure information among compounds (39)(40)(41), addressing reagent selection (42), database similarity searches (43), and scaffold diversity (44). Entropy-based methods have also been used on assay data to distinguish single-target compounds from those with multitarget effects (45), and to quantify relationships between targets based on K i profiles among sets of common inhibitors (46).…”
mentioning
confidence: 99%
“…In this article, we describe how we have used diversity-based library subsets to complement this "screen-all" approach, as our subsets have largely been used to supplement full library screens rather than to replace them. Numerous methods have been described for assessing molecular similarity (e.g., see Bender et al 28 and references therein). Reduced topological representations (Murcko assemblies), ECFPs, and atomic property descriptors (BCUT) have all shown promise in the selection of diverse subsets that retain high levels of hit identification.…”
Section: Discussionmentioning
confidence: 99%