BackgroundDisulfide-rich peptides (DRPs) are found throughout nature. They are suitable scaffolds for drug development due to their small cores, whose disulfide bonds impart extraordinary chemical and biological stability. A challenge in developing a DRP therapeutic is to engineer binding to a specific target. This challenge can be overcome by (i) sampling the large sequence space of a given scaffold through a phage display library and by (ii) panning multiple libraries encoding structurally distinct scaffolds. Here, we implement a protocol for defining these diverse scaffolds, based on clustering structurally defined DRPs according to their conformational similarity.ResultsWe developed and applied a hierarchical clustering protocol based on DRP structural similarity, followed by two post-processing steps, to classify 806 unique DRP structures into 81 clusters. The 20 most populated clusters comprised 85% of all DRPs. Representative scaffolds were selected from each of these clusters; the representatives were structurally distinct from one another, but similar to other DRPs in their respective clusters. To demonstrate the utility of the clusters, phage libraries were constructed for three of the representative scaffolds and panned against interleukin-23. One library produced a peptide that bound to this target with an IC50 of 3.3 μM.ConclusionsMost DRP clusters contained members that were diverse in sequence, host organism, and interacting proteins, indicating that cluster members were functionally diverse despite having similar structure. Only 20 peptide scaffolds accounted for most of the natural DRP structural diversity, providing suitable starting points for seeding phage display experiments. Through selection of the scaffold surface to vary in phage display, libraries can be designed that present sequence diversity in architecturally distinct, biologically relevant combinations of secondary structures. We supported this hypothesis with a proof-of-concept experiment in which three phage libraries were constructed and panned against the IL-23 target, resulting in a single-digit μM hit and suggesting that a collection of libraries based on the full set of 20 scaffolds increases the potential to identify efficiently peptide binders to a protein target in a drug discovery program.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1350-9) contains supplementary material, which is available to authorized users.