Using a diverse collection of small molecules we recently found that compound sets from different sources (commercial; academic; natural) have different protein-binding behaviors, and these behaviors correlate with trends in stereochemical complexity for these compound sets. These results lend insight into structural features that synthetic chemists might target when synthesizing screening collections for biological discovery. We report extensive characterization of structural properties and diversity of biological performance for these compounds and expand comparative analyses to include physicochemical properties and three-dimensional shapes of predicted conformers. The results highlight additional similarities and differences between the sets, but also the dependence of such comparisons on the choice of molecular descriptors. Using a protein-binding dataset, we introduce an information-theoretic measure to assess diversity of performance with a constraint on specificity. Rather than relying on finding individual active compounds, this measure allows rational judgment of compound subsets as groups. We also apply this measure to publicly available data from ChemBank for the same compound sets across a diverse group of functional assays. We find that performance diversity of compound sets is relatively stable across a range of property values as judged by this measure, both in protein-binding studies and functional assays. Because building screening collections with improved performance depends on efficient use of synthetic organic chemistry resources, these studies illustrate an important quantitative framework to help prioritize choices made in building such collections.A central theme in applying cheminformatics to discovery chemistry is to relate synthetic decisions to consequences on both chemical structure and biological assay performance. Historically, such efforts focused on small sets of similar compounds, and single performance measurements (1-3), providing guidance to chemists in compound optimization against singletarget proteins or processes (4). However, additional methods are needed to judge large sets of compounds, such as those used in small-molecule screening. Progress toward more valuable screening collections (5) requires unbiased methods to evaluate diversity of assay performance for compound sets rather than performance of individual members.A widely used method to judge compounds for drug discovery is the "rule of 5" (RO5) (6), which predicts poor absorption or permeation for compounds that deviate from property-value constraints: H-bond donors (Hd) and acceptors (Ha), molecular weight (MW), and calculated partition coefficients (cLogP). Recent studies have attempted to refine such rules (7-9) and extend them to other goals (10-13), such as making leads or probes. Such property filters have been debated and reviewed (14-16), and their long-term impact on pharmaceutical research is starting to be analyzed (17, 18). Importantly, exceptions to these rules, including natural products (19-21), ...