Similarity is a subjective and multifaceted concept, regardless of whether compounds or any other objects are considered. Despite its intrinsically subjective nature, attempts to quantify the similarity of compounds have a long history in chemical informatics and drug discovery. Many computational methods employ similarity measures to identify new compounds for pharmaceutical research. However, chemoinformaticians and medicinal chemists typically perceive similarity in different ways. Similarity methods and numerical readouts of similarity calculations are probably among the most misunderstood computational approaches in medicinal chemistry. Herein, we evaluate different similarity concepts, highlight key aspects of molecular similarity analysis, and address some potential misunderstandings. In addition, a number of practical aspects concerning similarity calculations are discussed.
Activity cliffs are generally defined as pairs of structurally similar compounds having large differences in potency. The analysis of activity cliffs is of general interest because structure-activity relationship (SAR) determinants can often be deduced from them. Critical questions for the study of activity cliffs include how similar compounds should be to qualify as cliff partners, how similarity should be assessed, and how large potency differences between participating compounds should be. Thus far, activity cliffs have mostly been defined on the basis of calculated Tanimoto similarity values using structural descriptors, especially 2D fingerprints. As any theoretical assessment of molecular similarity, this approach has its limitations. For example, calculated Tanimoto similarities might often be difficult to reconcile and interpret from a chemical perspective, a point of critique frequently raised in medicinal chemistry. Herein, we have explored activity cliffs by considering well-defined substructure replacements instead of calculated similarity values. For this purpose, the matched molecular pair (MMP) formalism has been applied. MMPs were systematically derived from public domain compounds, and activity cliffs were extracted from them, termed MMP-cliffs. The frequency of cliff formation was determined for compounds active against different targets, MMP-cliffs were analyzed in detail, and re-evaluated on the basis of Tanimoto similarity. In many instances, chemically intuitive activity cliffs were only detected on the basis of MMPs, but not Tanimoto similarity.
We have developed a class of binding proteins, called avimers, to overcome the limitations of antibodies and other immunoglobulin-based therapeutic proteins. Avimers are evolved from a large family of human extracellular receptor domains by in vitro exon shuffling and phage display, generating multidomain proteins with binding and inhibitory properties. Linking multiple independent binding domains creates avidity and results in improved affinity and specificity compared with conventional single-epitope binding proteins. Other potential advantages over immunoglobulin domains include simple and efficient production of multitarget-specific molecules in Escherichia coli, improved thermostability and resistance to proteases. Avimers with sub-nM affinities were obtained against five targets. An avimer that inhibits interleukin 6 with 0.8 pM IC50 in cell-based assays is biologically active in two animal models.
The scaffold hopping potential of popular 2D fingerprints has been thoroughly investigated. We have found that these types of fingerprints have at least limited scaffold hopping ability including early enrichment of small numbers of active scaffolds at high database ranks. However, it has not been possible to derive Tanimoto coefficient value ranges for individual fingerprints that are generally preferred for scaffold hopping. For selected fingerprints, similarity threshold values have been identified that yield small database selection sets having a high probability to contain a few active scaffolds. Furthermore, essentially all tested fingerprints have shown the ability to enrich scaffold hops in approximately 1% of a screening database. For the test cases reported herein, selecting 0.5-1% of the screening database yields approximately 25% of the available scaffolds. On the basis of our findings, practical guidelines for virtual screening using different types of 2D fingerprints have been formulated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.