PrefaceDirected evolution circumvents our profound ignorance of how a protein's sequence encodes its function by using iterative rounds of random mutation and artificial selection to discover new and useful proteins. Proteins can be tuned to adapt to new functions or environments via simple adaptive walks involving small numbers of mutations. Directed evolution studies have demonstrated how rapidly at least some proteins can evolve under strong selection pressures, and, because the entire 'fossil record' of evolutionary intermediates is available for detailed study, they have provided new insight into the relationship between sequence and function. Directed evolution has also shown how mutations that are functionally neutral can set the stage for further adaptation.Millions of years of life's struggle for survival in different environments have led proteins to provide diverse, creative and efficient solutions to a wide range of problems, from extracting energy from the environment to repairing and replicating their own code. Good solutions to biological problems can also be good solutions to human problems -proteins are in fact widely used in the food, chemicals, consumer products, and medical fields. Not content with Nature's protein repertoire, however, protein engineers are working to extend known protein function to new environments or tasks [1][2][3][4] and to create new functions altogether [5][6][7] .Notwithstanding significant advances, a molecular-level understanding of why one protein performs a certain task better than another remains elusive. This state of affairs is perhaps not surprising when we remember that a protein often undergoes conformational changes during function and exists as a dynamic ensemble of conformers that are only slightly more stable than their unfolded and nonfunctional states and that might themselves be functionally diverse 8 . Mutations far from active sites can influence protein function 9,10 . Engineering enzymatic activity is particularly difficult, because very small changes in structure or chemical properties can have very significant effects on catalysis. Thus predicting the amino acid sequence, or changes to an amino acid sequence, that would generate a specific behavior remains a challenge, particularly for applications requiring high performance (such as an industrial enzyme or a therapeutic protein). Unfortunately, where function is concerned, details matter, and we just don't understand the details. Evolution, however, had no difficulty generating these impressive molecules. Despite their complexity and finely-tuned nature, proteins are remarkably evolvable: they can adapt under the pressure of selection, changing behavior, function and even fold. Protein engineers have learned to exploit this evolvability using 'directed evolution' -the application of iterative rounds of mutation and artificial selection or screening to generate new proteins. Hundreds of directed evolution experiments have demonstrated the ease with which proteins adapt to new challenges 11 . Not...
Knowing how protein sequence maps to function (the "fitness landscape") is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.protein engineering | recombination | machine learning | experimental design | active learning I n the mapping of protein sequence to protein behavior, the phenotype can be envisioned as a surface, or landscape, over the high-dimensional space of possible sequences (1). This "fitness landscape" could describe how the protein contributes to organismal fitness, or it may represent a biophysical property, such as stability, enzyme activity, or ligand binding affinity. The structure of this surface describes the spectrum of possible phenotypes as well as the mutational accessibility among them and therefore strongly influences protein evolution. This surface is also the objective function for protein engineering, which seeks to identify protein sequences that are highly optimized for a given property or set of properties.Identifying such optimized sequences is extremely challenging for several reasons. First, the space of possible protein sequences is incomprehensibly large and will never be searched exhaustively by any means, naturally, in the laboratory, or computationally (2, 3). Second, within this vast space, functional proteins are extremely scarce, with estimates that range from a high of 1 in 10 11 to as little as 1 in 10 77 (4,5). Of the sequences that are functional, most have poor fitness and their numbers decrease exponentially with higher levels of fitness (6, 7). Thus, highly fit sequences are vanishingly rare and overwhelmed by nonfunctional and mediocre sequences.Computational protein engineering uses models of protein function to guide a search for optimized sequences. These models typically contain an atomic struc...
Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a hightemperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNAsequencing technologies, enables high-throughput mapping of enzyme sequence space.protein engineering | droplet-based microfluidics | high-throughput DNA sequencing E nzymes are powerful biological catalysts capable of remarkably accelerating the rates of chemical transformations (1). The molecular bases of these rate accelerations are often complex, using multiple steps, multiple catalytic mechanisms, and relying on numerous molecular interactions, in addition to those provided by the main catalytic groups. This complexity imposes a significant barrier to understanding how an enzyme's sequence impacts its function and, thus, on our ability to rationally design biocatalysts with new or enhanced functions (2-4).Comprehensive mappings of sequence-function relationships can be used to dissect the molecular basis of protein function in an unbiased manner (5). Growth selections or in vitro binding screens can be combined with next-generation DNA sequencing to generate detailed mappings between a protein's sequence and its biochemical properties, such as binding affinity, enzymatic activity, and stability (6-9). This deep mutational scanning approach has been used to study the structure of the protein fitness landscape, discover new functional sites, improve molecular energy functions, and identify beneficial combinations of mutations for protein engineering. However, these methods rely on functional assays coupled to cell growth or protein binding, severely limiting the types of proteins that can be analyzed. For example, most enzymes of biological or industrial relevance cannot be analyzed using existing methods because they do not catalyze a reaction that can be directly coupled to cell growth. Experimental advances are needed to broaden the applicability of deep mutational scanning to the diverse palette of functions performed by enzymes.In this paper, we present a general method for mapping protein sequence-func...
The development of molecular probes that allow in vivo imaging of neural signaling processes with high temporal and spatial resolution remains challenging. Here we applied directed evolution techniques to create magnetic resonance imaging (MRI) contrast agents sensitive to the neurotransmitter dopamine. The sensors were derived from the heme domain of the bacterial cytochrome P450-BM3 (BM3h). Ligand binding to a site near BM3h’s paramagnetic heme iron led to a drop in MRI signal enhancement and a shift in optical absorbance. Using an absorbance-based screen, we evolved the specificity of BM3h away from its natural ligand and toward dopamine, producing sensors with dissociation constants for dopamine of 3.3–8.9 μM. These molecules were used to image depolarization-triggered neurotransmitter release from PC12 cells and in the brains of live animals. Our results demonstrate the feasibility of molecular-level functional MRI using neural activity–dependent sensors, and our protein engineering approach can be generalized to create probes for other targets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.