When assessing the pharmacological potential of large libraries of compounds, it is often useful to start by determining the biochemical activities of some subset thereof. This is so whether the compounds in question have in fact already been synthesized or exist solely as virtual libraries. A suitable subset for this task must be structurally diverse, so as to minimize redundant testing, but must also be representative, so that valuable subgroups do not get overlooked. These two needs are intrinsically in conflict, with gains in one necessarily coming at the expense of the other. Results obtained using optimizable K-dissimilarity selection and clustering are described and compared with those obtained using more traditional agglomerative hierarchical clustering techniques.