Functional primordial proteins presumably originated from random sequences, but it is not known how frequently functional, or even folded, proteins occur in collections of random sequences. Here we have used in vitro selection of messenger RNA displayed proteins, in which each protein is covalently linked through its carboxy terminus to the 3′ end of its encoding mRNA 1 , to sample a large number of distinct random sequences. Starting from a library of 6 × 10 12 proteins each containing 80 contiguous random amino acids, we selected functional proteins by enriching for those that bind to ATP. This selection yielded four new ATP-binding proteins that appear to be unrelated to each other or to anything found in the current databases of biological proteins. The frequency of occurrence of functional proteins in random-sequence libraries appears to be similar to that observed for equivalent RNA libraries 2,3 .The frequency of occurrence of functional proteins in collections of random sequences is an important constraint on models of the evolution of biological proteins. Here we have experimentally determined this frequency by isolating proteins with a specific function from a large random-sequence library of known size. We selected for proteins that could bind a small molecule target with high affinity and specificity as a way of identifying amino-acid sequences that could form a three-dimensional folded state with a well-defined binding site and therefore exhibit an arbitrary specific function. ATP was chosen as the target for binding to allow comparison with known biological ATP-binding motifs and also with previous selections using random-sequence RNA libraries 2,3 .Because protein sequences with specific functions are expected to be quite rare in protein sequence space, we prepared a DNA library of 4 × 10 14 independently generated random sequences. This DNA library was specifically constructed to avoid stop codons and frameshift mutations 4 , and was designed for use in mRNA display 1 selections. This DNA library was then used to generate 6 × 10 12 purified non-redundant random proteins that were used as the input into the first selection step. These proteins contain a contiguous stretch of random amino acids 80 residues in length, long enough to form known protein domains.
© 2001 Macmillan Magazines LtdCorrespondence and requests for materials should be addressed to J.W.S. (szostak@molbio.mgh.harvard.edu). Supplementary information is available on Nature's World-Wide Web site (http://www.nature.com) or as paper copy from the London editorial office of Nature.The DNA sequences encoding the consensus protein sequences of families A, B, C, D, 18predom and clone 18-19 have been deposited in GenBank under accession codes AF306524 to AF306529, respectively.
HHMI Author Manuscript
HHMI Author Manuscript
HHMI Author ManuscriptUnlike other libraries that have been used in protein selections, this random region is not part of a larger structure that would otherwise tend to constrain or bias the conformation of th...