The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, “constrained hallucination,” optimizes sequences such that their predicted structures contain the desired functional site. The second approach, “inpainting,” starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests.
An outstanding challenge in protein design is the design of binders against therapeutically relevant target proteins via scaffolding the discontinuous binding interfaces present in their often large and complex binding partners. There is currently no method for sampling through the almost unlimited number of possible protein structures for those capable of scaffolding a specified discontinuous functional site; instead, current approaches make the sampling problem tractable by restricting search to structures composed of pre-defined secondary structural elements. Such restriction of search has the disadvantage that considerable trial and error can be required to identify architectures capable of scaffolding an arbitrary discontinuous functional site, and only a tiny fraction of possible architectures can be explored. Here we build on recent advances in de novo protein design by deep network hallucination to develop a solution to this problem which eliminates the need to pre-specify the structure of the scaffolding in any way. We use the trRosetta residual neural network, which maps input sequences to predicted inter-residue distances and orientations, to compute a loss function which simultaneously rewards recapitulation of a desired structural motif and the ideality of the surrounding scaffold, and generate diverse structures harboring the desired binding interface by optimizing this loss function by gradient descent. We illustrate the power and versatility of the method by scaffolding binding sites from proteins involved in key signaling pathways with a wide range of secondary structure compositions and geometries. The method should be broadly useful for designing small stable proteins containing complex functional sites.
Current approaches to de novo design of proteins harboring a desired binding or catalytic motif require pre-specification of an overall fold or secondary structure composition, and hence considerable trial and error can be required to identify protein structures capable of scaffolding an arbitrary functional site. Here we describe two complementary approaches to the general functional site design problem that employ the RosettaFold and AlphaFold neural networks which map input sequences to predicted structures. In the first “constrained hallucination” approach, we carry out gradient descent in sequence space to optimize a loss function which simultaneously rewards recapitulation of the desired functional site and the ideality of the surrounding scaffold, supplemented with problem-specific interaction terms, to design candidate immunogens presenting epitopes recognized by neutralizing antibodies, receptor traps for escape-resistant viral inhibition, metalloproteins and enzymes, and target binding proteins with designed interfaces expanding around known binding motifs. In the second “missing information recovery” approach, we start from the desired functional site and jointly fill in the missing sequence and structure information needed to complete the protein in a single forward pass through an updated RoseTTAFold trained to recover sequence from structure in addition to structure from sequence. We show that the two approaches have considerable synergy, and AlphaFold2 structure prediction calculations suggest that the approaches can accurately generate proteins containing a very wide array of functional sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.