Natural products represent important sources of bioactive compounds in drug discovery efforts. In this work, we compiled five natural products databases available in the public domain and performed a comprehensive chemoinformatic analysis focused on the content and diversity of the scaffolds with an overview of the diversity based on molecular fingerprints. The natural products databases were compared with each other and with a set of molecules obtained from in-house combinatorial libraries, and with a general screening commercial library. It was found that publicly available natural products databases have different scaffold diversity. In contrast to the common concept that larger libraries have the largest scaffold diversity, the largest natural products collection analyzed in this work was not the most diverse. The general screening library showed, overall, the highest scaffold diversity. However, considering the most frequent scaffolds, the general reference library was the least diverse. In general, natural products databases in the public domain showed low molecule overlap. In addition to benzene and acyclic compounds, flavones, coumarins, and flavanones were identified as the most frequent molecular scaffolds across the different natural products collections. The results of this work have direct implications in the computational and experimental screening of natural product databases for drug discovery.Key words: chemoinformatics, cyclic system, molecular diversity, natural product, Shannon entropy, Traditional Chinese Medicine.Abbreviations: CSR, cumulative scaffold recovery; MACCS, Molecular ACCess System; MEQI, Molecular Equivalence Index; MOE, molecular operating environment; SE, Shannon entropy; TCM, Traditional Chinese Medicine.Received 22 June 2012, revised 18 July 2012 and accepted for publication 19 July 2012Traditionally, natural products have played a major role in drug discovery and development by providing novel chemical scaffolds, and serving as leads or drugs (1,2). For many years, 80% of drugs were either natural products or natural product-derived compounds. Even in the modern era, after the advent of techniques such as high-throughput screening of synthetic libraries, half of the drugs approved since 1994 are based on natural products research (3,4). Recent comprehensive reviews of natural products, or compounds inspired by natural products, indicate that more than 100 natural product compounds are currently in clinical trials. Natural products offer the advantage of discovering novel structural classes (5,6) because of their well-documented better coverage of chemical space relative to large synthetic compounds (7). Hence, the structural or chemical diversity of natural products can be utilized to access bioactive compounds with novel scaffolds (7-9). The inclusion of sources of natural products in drug discovery would open up more avenues for new classes of drugs (10) as well as new chemical entities, evidenced by studies that show structural or chemical complementarity between natural...