Using a diverse collection of small molecules generated from a variety of sources, we measured protein-binding activities of each individual compound against each of 100 diverse (sequence-unrelated) proteins using small-molecule microarrays. We also analyzed structural features, including complexity, of the small molecules. We found that compounds from different sources (commercial, academic, natural) have different protein-binding behaviors and that these behaviors correlate with general trends in stereochemical and shape descriptors for these compound collections. Increasing the content of sp 3 -hybridized and stereogenic atoms relative to compounds from commercial sources, which comprise the majority of current screening collections, improved binding selectivity and frequency. The results suggest structural features that synthetic chemists can target when synthesizing screening collections for biological discovery. Because binding proteins selectively can be a key feature of high-value probes and drugs, synthesizing compounds having features identified in this study may result in improved performance of screening collections. S mall-molecule probe-and drug-discovery activities in academia and the pharmaceutical industry often begin with highthroughput screening. Many thousands of small molecules are tested with the expectation that each has potential as a discovery lead. Thus, assembling or synthesizing compound collections for small-molecule screening represents an important step in discovery success, particularly when selecting among compounds from a variety of synthetic and natural sources. Unbiased methods to evaluate the assay performance of compounds from different sources, and to relate performance to chemical structure (defined by computed structural properties) (1, 2), can provide guidance to one element of more valuable small-molecule screening collections.Comparative analyses between compounds often involve cheminformatic analysis of compound structures (3-5) or retrospective analysis of compound performance by mining the literature (6-8) or historical data (9, 10). For example, intermediate molecular complexity has been suggested as theoretically preferable for drug leads (11), and this relationship is supported by evidence mined from historical data (9). In this study, we performed unbiased comparisons of compounds from natural and synthetic sources by first identifying compounds with unknown activities and then exposing them to a common assay platform. We identified a compound collection comprising three subsets: (i) 6,152 compounds from commercial sources that are representative of many common screening collections (commercial compounds; CC); (ii) 6,623 compounds assembled from the academic synthetic chemistry community using, e.g., diversity-oriented synthesis (diverse compounds; DC); and (iii) 2,477 naturally occurring compounds (natural products; NP). We then (i) analyzed distributions of stereochemical and shape complexity for each set;(ii) measured protein-binding activities of each membe...