2021
DOI: 10.1186/s13321-021-00504-4
|View full text |Cite
|
Sign up to set email alerts
|

Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection

Abstract: Despite being a central concept in cheminformatics, molecular similarity has so far been limited to the simultaneous comparison of only two molecules at a time and using one index, generally the Tanimoto coefficent. In a recent contribution we have not only introduced a complete mathematical framework for extended similarity calculations, (i.e. comparisons of more than two molecules at a time) but defined a series of novel idices. Part 1 is a detailed analysis of the effects of various parameters on the simila… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
59
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

5
3

Authors

Journals

citations
Cited by 44 publications
(60 citation statements)
references
References 45 publications
1
59
0
Order By: Relevance
“…The framework that we introduced here provides a new alternative, which allows to simultaneously compare more than two dichotomous vectors. This scales in order O(N), presenting a tremendous speed gain: this is further discussed in the accompanying paper [22]. Applications include subset selection, clustering, diversity picking or we can even apply this methodology to estimate the diversities of entire compound libraries.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The framework that we introduced here provides a new alternative, which allows to simultaneously compare more than two dichotomous vectors. This scales in order O(N), presenting a tremendous speed gain: this is further discussed in the accompanying paper [22]. Applications include subset selection, clustering, diversity picking or we can even apply this methodology to estimate the diversities of entire compound libraries.…”
Section: Discussionmentioning
confidence: 99%
“…The theoretical SRD distributions were defined for different sample sizes up to 13. The theoretical SRD can be well approximated with a Gaussian distribution, if the (22)…”
Section: Development Of Sum Of Ranking Differences (Srd)mentioning
confidence: 99%
“…This algorithm improves upon our Max_nDis picker 21 because step 2.1 allows for a more thorough selection of the diverse conformations by effectively serving as a tiebreaker between conformations with the same extended similarity with respect to the preselected set. That is, we still use the minimization of the extended similarity (step 2) as the driving force of the algorithm, but step 2.1 adds an extra layer that leads to an even more diverse set in the end.…”
Section: Methodsmentioning
confidence: 99%
“…The algorithm is inspired by the diversity pickers commonly applied in cheminformatics to sample large chemical spaces, usually based on the use of binary molecular fingerprints. 18 The various versions of the extended similarity indices 18 20 have shown great promise in the problems of diversity selection 21 and exploration of large and various data sets 22 , 23 including complex biological ensembles. 24 The keys to this success are the ability of the extended indices to quantify similarities between any number of objects and the fact that they can do so with linear scaling.…”
Section: Introductionmentioning
confidence: 99%
“…Extended similarity analysis: from pair of molecules, to chemical space and beyond [22,23] Jürgen Bajorath University of Bonn (Germany)…”
Section: Introductionmentioning
confidence: 99%