Despite the promise of deep learning accelerated protein engineering, examples of such improved proteins are scarce. Here we report that a 3D convolutional neural network trained to associate amino acids with neighboring chemical microenvironments can guide identification of novel gain-of-function mutations that are not predicted by energetics-based approaches. Amalgamation of these mutations improved protein function in vivo across three diverse proteins by at least 5-fold. Furthermore, this model provides a means to interrogate the chemical space within protein microenvironments and identify specific chemical interactions that contribute to the gain-of-function phenotypes resulting from individual mutations.
We have found that the overproduction of enzymes in bacteria followed by their lyophilization leads to 'cellular reagents' that can be directly used to carry out numerous molecular biology reactions. We demonstrate the use of cellular reagents in a variety of molecular diagnostics, such as TaqMan qPCR with no diminution in sensitivity, and in synthetic biology cornerstones such as the Gibson assembly of DNA fragments, where new plasmids can be constructed solely based on adding cellular reagents. Cellular reagents have significantly reduced complexity and cost of production, storage and implementation, features that should facilitate accessibility and use in resource-poor conditions.
While deep learning methods exist to guide protein optimization, examples of novel proteins generated with these techniques require a priori mutational data. Here we report a 3D convolutional neural network that associates amino acids with neighboring chemical microenvironments at state-of-the-art accuracy. This algorithm enables identification of novel gain-of-function mutations, and subsequent experiments confirm substantive phenotypic improvements in stability-associated phenotypes in vivo across three diverse proteins. IntroductionProtein engineering is a transformative approach in biotechnology and biomedicine commonly used to alter natural proteins to tolerate non-native environments 1 , modify substrate specificity 2 , and improve catalytic activity 3 . Underpinning these properties is a protein's ability to fold and adopt a stable active configuration. This property is currently engineered either from sequence 4 , or energetic simulations 5 . Deep learning approaches have been reported, however these models either predict empirically measured stability effects in biased datasets containing only thousands of annotated observations 6 or require model training on the target protein 7, 8 . Recently, a 3D-CNN was trained to associate local protein microenvironments with their central amino acid 9 . Given structural data, this model was able to predict wild type amino acids at positions where destabilizing mutations had been experimentally introduced. We hypothesized that the converse might also be true: stabilizing, gain-of-function mutations could be introduced at positions where the wild-type residue is disfavored. Here, we use a deep learning algorithm to improve in vivo protein functionality several fold by introducing mutations to better align proteins with amino acidstructure relationships gleaned from the entirety of the observed proteome. ResultsIn order to generate an algorithm that could identify unfavorable amino acid residues in virtually any protein structure, we trained a model to learn the correct association between an amino acid and its surrounding chemical environment, relying on the wealth of structures in the Protein Data Bank. We began by rebuilding the neural network architecture published by Torng and Altman with minor modifications (Fig. 1a, see Online Methods for details), replicating the reported classification accuracy of 41.2% (Fig. 1b) using the original training and testing sets (32,760 and 1601 structures, respectively) 9 . To improve the model's performance, we made several discrete changes towards more explicit biophysical annotations adding in new atomic channels for hydrogen atoms and accommodating the partial charge and solvent accessibility for each atom, increasing accuracy to 43.4% and 52.4% respectively.The selection methodology for both protein structures and amino acid residues introduced several biases to the training data. The dataset contained multiple structures of closely related proteins
Since the fixation of the genetic code, evolution has largely been confined to 20 proteinogenic amino acids. The development of orthogonal translation systems that allow for the codon-specific incorporation of noncanonical amino acids may provide a means to expand the code, but these translation systems cannot be simply superimposed on cells that have spent billions of years optimizing their genomes with the canonical code. We have therefore carried out directed evolution experiments with an orthogonal translation system that inserts 3-nitro-L-tyrosine across from amber codons, creating a 21 amino acid genetic code in which the amber stop codon ambiguously encodes either 3-nitro-L-tyrosine or stop. The 21 amino acid code is enforced through the inclusion of an addicted, essential gene, a beta-lactamase dependent upon 3-nitro-L-tyrosine incorporation. After 2000 generations of directed evolution, the fitness deficit of the original strain was largely repaired through mutations that limited the toxicity of the noncanonical. While the evolved lineages had not resolved the ambiguous coding of the amber codon, the improvements in fitness allowed new amber codons to populate protein coding sequences.
While bacteriophage have previously been used as a model system to understand thermal adaptation, most adapted genomes observed to date contain very few modifications and cover a limited temperature range. Here, we set out to investigate genome adaptation to thermal stress by adapting six populations of T7 bacteriophage virions to increasingly stringent heat challenges. Further, we provided three of the phage populations access to a new genetic code in which Amber codons could be read as selenocysteine, potentially allowing the formation of more stable selenide-containing bonds. Phage virions responded to the thermal challenges with a greater than 10°C increase in heat tolerance and fixed highly reproducible patterns of nonsynonymous substitutions and genome deletions. Most fixed mutations mapped to either the tail complex or to the three internal virion proteins that form a pore across the E. coli cell membrane during DNA injection. However, few global changes in Amber codon usage were observed, with only one natural Amber codon being lost. These results reinforce a model in which adaptation to thermal stress proceeds via the cumulative fixation of a small set of highly adaptive substitutions, and that adaptation to new genetic codes proceeds only slowly, even with the possibility of potential phenotypic advantages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.