This article introduces the computational procedure FamClash for analyzing incompatibilities in engineered protein hybrids by using protein family sequence data. All pairs of residue positions in the sequence alignment that conserve the property triplet of charge, volume, and hydrophobicity are first identified, and significant deviations are denoted as residue-residue clashes. This approach moves beyond earlier efforts aimed at solely classifying hybrids as functional or nonfunctional by correlating the rank ordering of these hybrids based on their activity levels. Experimental testing of this approach was performed in parallel to assess the predictive ability of FamClash. As a model system, single-crossover ITCHY (incremental truncation for the creation of hybrid enzymes) libraries were prepared from the Escherichia coli and Bacillus subtilis dihydrofolate reductases, and the activities of functional hybrids were determined. Comparisons of the predicted clash map as a function of crossover position revealed good agreement with activity data, reproducing the observed V shape and matching the location of a local peak in activity.protein engineering ͉ dihydrofolate reductase ͉ residue-residue clash ͉ computational hybrid prescreening ͉ incremental truncation R ecent advances in protein engineering (1-5) have allowed researchers to go beyond the limitations of homologydependent directed evolution methods. The ability to freely explore protein sequence space has revealed a number of troublesome trends. First, the lower the sequence identity of the recombined parental sequences, the smaller the percentage of the combinatorial protein library that remains functional (2, 4). This has been reported in several studies (6-8) using differing protocols, thus implicating the global nature of this effect. More troublesome is the finding that the remaining functional hybrids tend to have only residual activities. Therefore, it appears that exploring protein sequence space freely comes at the expense of severely degrading the average stability and functionality of the combinatorial library. This has motivated the development of computational methods to prescreen hybrids for their potential of being stably folded (9) and functional. These analyses then serve to direct the sampling of protein sequences by the combinatorial library toward desirable regions in sequence space. Specifically, favorable positions for junctions between fragments from different parental sequences can be identified, and restrictions can be imposed on sets of parental sequences that contribute fragments to a particular junction.Therefore, further improvements in the stability and functionality of hybrid proteins may be attained by developing quantitative methods that identify deleterious interactions arising from residue pairs within the gene fragment combinations. To this end, Monte Carlo simulations by Bogarad and Deem (10) suggested that swapping of low-energy structures is least disruptive to protein structure. The SCHEMA algorithm (11) postulates t...
A number of computational approaches have been developed to reengineer promising chimeric proteins one at a time through targeted point mutations. In this article, we introduce the computational procedure IPRO (iterative protein redesign and optimization procedure) for the redesign of an entire combinatorial protein library in one step using energy-based scoring functions. IPRO relies on identifying mutations in the parental sequences, which when propagated downstream in the combinatorial library, improve the average quality of the library (e.g., stability, binding affinity, specific activity, etc.). Residue and rotamer design choices are driven by a globally convergent mixed-integer linear programming formulation. Unlike many of the available computational approaches, the procedure allows for backbone movement as well as redocking of the associated ligands after a prespecified number of design iterations. IPRO can also be used, as a limiting case, for the redesign of a single or handful of individual sequences. The application of IPRO is highlighted through the redesign of a 16-member library of Escherichia coli/Bacillus subtilis dihydrofolate reductase hybrids, both individually and through upstream parental sequence redesign, for improving the average binding energy. Computational results demonstrate that it is indeed feasible to improve the overall library quality as exemplified by binding energy scores through targeted mutations in the parental sequences.
In this paper, we introduce and test two new sequence-based protein scoring systems (i.e. S1, S2) for assessing the likelihood that a given protein hybrid will be functional. By binning together amino acids with similar properties (i.e. volume, hydrophobicity and charge) the scoring systems S1 and S2 allow for the quantification of the severity of mismatched interactions in the hybrids. The S2 scoring system is found to be able to significantly functionally enrich a cytochrome P450 library over other scoring methods. Given this scoring base, we subsequently constructed two separate optimization formulations (i.e. OPTCOMB and OPTOLIGO) for optimally designing protein combinatorial libraries involving recombination or mutations, respectively. Notably, two separate versions of OPTCOMB are generated (i.e. model M1, M2) with the latter allowing for position-dependent parental fragment skipping. Computational benchmarking results demonstrate the efficacy of models OPTCOMB and OPTOLIGO to generate high scoring libraries of a prespecified size.
Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.
In this article, we introduce a rapid, protein sequence database-driven approach to characterize all contacting residue pairs present in protein hybrids for inconsistency with protein family structural features. This approach is based on examining contacting residue pairs with different parental origins for different types of potentially unfavorable interactions (i.e. electrostatic repulsion, steric hindrance, cavity formation and hydrogen bond disruption). The identified clashing residue pairs between members of a protein family are then contrasted against functionally characterized hybrid libraries. Comparisons for five different protein recombination studies available in the literature: (i) glycinamide ribonucleotide transformylase (GART) from Escherichia coli (purN) and human (hGART), (ii) human Mu class glutathione S-transferase (GST) M1-1 and M2-2, (iii) beta-lactamase TEM-1 and PSE-4, (iv) catechol-2,3-oxygenase xylE and nahH, and (v) dioxygenases (toluene dioxygenase, tetrachlorobenzene dioxygenase and biphenyl dioxygenase) reveal that the patterns of identified clashing residue pairs are remarkably consistent with experimentally found patterns of functional crossover profiles. Specifically, we show that the proposed residue clash maps are on average 5.0 times more effective than randomly generated clashes and 1.6 times more effective than residue contact maps at explaining the observed crossover distributions among functional members of hybrid libraries. This suggests that residue clash maps can provide quantitative guidelines for the placement of crossovers in the design of protein recombination experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.