To probe the potential for activity in unevolved amino acid sequence space, we created a third generation combinatorial library of de novo four-helix bundle proteins. The "artificial superfamily" of helical bundles was designed using binary patterning of polar and nonpolar residues, and expressed in Escherichia coli from a library of synthetic genes. WA20, picked from the library, is one of the most stable proteins in the superfamily, and has rudimentary activities such as esterase and lipase. Here we report the crystal structure of WA20, determined by the multiwavelength anomalous dispersion method. Unexpectedly, the WA20 crystal structure is not a monomeric four-helix bundle, but a dimeric four-helix bundle. Each monomer comprises two long α-helices that intertwist with the helices of the other monomer. The two monomers together form a 3D domain-swapped four-helix bundle dimer. In addition, there are two hydrophobic pockets, which may potentially provide substrate binding sites. Small-angle X-ray scattering shows that the molecular weight of WA20 is ~25 kDa and the shape is rod-like (the maximum length, D(max) = ~8 nm), indicating that WA20 forms a dimeric four-helix bundle in solution. These results demonstrate that our de novo protein library contains not only simple monomeric proteins, but also stable and functional multimeric proteins.
Combinatorial libraries of synthetic DNA are increasingly being used to identify and evolve proteins with novel folds and functions. An effective strategy for maximizing the diversity of these libraries relies on the assembly of large genes from smaller fragments of synthetic DNA. To optimize library assembly and screening, it is desirable to remove from the synthetic libraries any sequences that contain unintended frameshifts or stop codons. Although genetic selection systems can be used to accomplish this task, the tendency of individual segments to yield misfolded or aggregated products can decrease the effectiveness of these selections. Furthermore, individual protein domains may misfold when removed from their native context. We report the development and characterization of an in vivo system to preselect sequences that encode uninterrupted gene segments regardless of the foldedness of the encoded polypeptide. In this system, the inserted synthetic gene segment is separated from an intein/thymidylate synthase (TS) reporter domain by a polyasparagine linker, thereby permitting the TS reporter to fold and function independently of the folding and function of the segment-encoded polypeptide. TS-deficient Escherichia coli host cells survive on selective medium only if the insert is uninterrupted and in-frame, thereby allowing selection and amplification of desired sequences. We demonstrate that this system can be used as a highly effective preselection tool for the production of large, diverse and high-quality libraries of de novo protein sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.