Chemoinformatics applications of cluster analysis over the past 35 years include chemical diversity for compound acquisition, analysis of HTS results for lead discovery, 2D and 3D chemical similarity searching for virtual screening, and hypothesis generation for lead hopping using molecular shape and pharmacophore descriptors. These applications still provide the majority of cluster analysis usage, but the advent of greater and greater computational resources has allowed researchers to tackle applications of ever increasing scale and complexity. In the past few years, a far broader array of clustering methods is now used—some entirely new, some common to other disciplines, and others modified to specific chemoinformatic applications. The chemoinformatic applications have also broadened to include greater biological information more commonly associated with bioinformatics. Indeed, clustering techniques, such as coclustering or self‐organizing trees, commonly found in bioinformatics, are beginning to find chemoinformatic application uses. Issues such as visualization and validation of clustering results continue to present challenging problems, especially given that the scale of many problems now attempted has increased enormously. Some new validation techniques have been introduced in the chemoinformatics literature that now allow for both a better understanding of the clustering results and help point to methods of greater efficacy. Effective validation and visualization of clustering results of large data sets has proven to be more problematic. WIREs Comput Mol Sci 2014, 4:34–48. doi: 10.1002/wcms.1152
This article is categorized under:
Computer and Information Science > Chemoinformatics