Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds

Axe, Douglas D.

doi:10.1016/j.jmb.2004.06.058

Cited by 69 publications

(49 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As selection progresses, promiscuity is diminished and enzymes become specific for one function while loosing their ability to catalyze other types of reactions. 51 The results reported here indicate that for sequences capable of folding into protein-like structures, achieving some level of binding and/or catalytic activity is not difficult, 52 and can occur frequently in unevolved collections. Thus, the challenge-both for the early stages of biological evolution and for modern efforts in protein design-is not simply to produce activity, but to rein in the promiscuity of unevolved proteins, and ultimately produce biocatalysts that are highly active towards particular substrates.…”

Section: Specificity Versus Promiscuity In Evolved and Unevolved Protmentioning

confidence: 79%

Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4‐helix bundle proteins

et al. 2009

View full text Add to dashboard Cite

To probe the potential for enzymatic activity in unevolved amino acid sequence space, we created a combinatorial library of de novo 4-helix bundle proteins. This collection of novel proteins can be considered an ''artificial superfamily'' of helical bundles. The superfamily of 102-residue proteins was designed using binary patterning of polar and nonpolar residues, and expressed in Escherichia coli from a library of synthetic genes. Sequences from the library were screened for a range of biological functions including heme binding and peroxidase, esterase, and lipase activities. Proteins exhibiting these functions were purified and characterized biochemically. The majority of de novo proteins from this superfamily bound the heme cofactor, and a sizable fraction of the proteins showed activity significantly above background for at least one of the tested enzymatic activities. Moreover, several of the designed 4-helix bundles proteins showed activity in all of the assays, thereby demonstrating the functional promiscuity of unevolved proteins. These studies reveal that de novo proteins-which have neither been designed for function, nor subjected to evolutionary pressure (either in vivo or in vitro)-can provide rudimentary activities and serve as a ''feedstock'' for evolution.

show abstract

Section: Specificity Versus Promiscuity In Evolved and Unevolved Protmentioning

confidence: 79%

Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4‐helix bundle proteins

et al. 2009

View full text Add to dashboard Cite

show abstract

“…On one hand, it has become clear from protein studies that the proportion of amino acid sequences that can be translated into functional proteins is very small. For proteins about 100 amino acids long, there are 20 100 = 10 130 possible sequences, yet only about 1 in 10 74 [75] to 1 in 10 63 [76] are capable of forming functional structures, and most enzymes in an organism such as E. coli are over 300 amino acids long [77]. By comparison, it has been estimated that only 10 120 to 10 140 quantum particle interactions can have occurred in the entire universe since the Big Bang [78,79], and the probabilistic resources relevant to chemical reactions on Earth allow only about 10 70 events [80].…”

Section: Irreducible Complexity and The Waiti Ng Ti Me To Benefi Cialmentioning

confidence: 99%

Computational Evolution Experiments Reveal a Net Loss of Genetic Information Despite Selection

Nelson¹,

Sanford

2013

Biological Information

View full text Add to dashboard Cite

Computational evolution experiments using the population genetics simulation Mendel's Accountant have suggested that deleterious mutation accumulation may pose a threat to the long-term survival of many biological species. By contrast, experiments using the program Avida have suggested that purifying selection is extremely effective and that novel genetic information can arise via selection for high-impact beneficial mutations. The present study shows that these approaches yield seemingly contradictory results only because of disparate parameter settings. Both agree when similar settings are used, and both reveal a net loss of genetic information under biologically relevant conditions. Further, both approaches establish the existence of three potentially prohibitive barriers to the evolution of novel genetic information: (1) the selection threshold and resulting genetic decay; (2) the waiting time to beneficial mutation; and (3) the pressure of reductive evolution, i.e., the selective pressure to shrink the genome and disable unused functions. The adequacy of mutation and natural selection for producing and sustaining novel genetic information cannot be properly assessed without a careful study of these issues.

show abstract

“…First, the space of possible protein sequences is incomprehensibly large and will never be searched exhaustively by any means, naturally, in the laboratory, or computationally (2, 3). Second, within this vast space, functional proteins are extremely scarce, with estimates that range from a high of 1 in 10 11 to as little as 1 in 10 77 (4,5). Of the sequences that are functional, most have poor fitness and their numbers decrease exponentially with higher levels of fitness (6, 7).…”

mentioning

confidence: 99%

Navigating the protein fitness landscape with Gaussian processes

Romero

Krause

Arnold

2012

Proc. Natl. Acad. Sci. U.S.A.

302

311

View full text Add to dashboard Cite

Knowing how protein sequence maps to function (the "fitness landscape") is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.protein engineering | recombination | machine learning | experimental design | active learning I n the mapping of protein sequence to protein behavior, the phenotype can be envisioned as a surface, or landscape, over the high-dimensional space of possible sequences (1). This "fitness landscape" could describe how the protein contributes to organismal fitness, or it may represent a biophysical property, such as stability, enzyme activity, or ligand binding affinity. The structure of this surface describes the spectrum of possible phenotypes as well as the mutational accessibility among them and therefore strongly influences protein evolution. This surface is also the objective function for protein engineering, which seeks to identify protein sequences that are highly optimized for a given property or set of properties.Identifying such optimized sequences is extremely challenging for several reasons. First, the space of possible protein sequences is incomprehensibly large and will never be searched exhaustively by any means, naturally, in the laboratory, or computationally (2, 3). Second, within this vast space, functional proteins are extremely scarce, with estimates that range from a high of 1 in 10 11 to as little as 1 in 10 77 (4,5). Of the sequences that are functional, most have poor fitness and their numbers decrease exponentially with higher levels of fitness (6, 7). Thus, highly fit sequences are vanishingly rare and overwhelmed by nonfunctional and mediocre sequences.Computational protein engineering uses models of protein function to guide a search for optimized sequences. These models typically contain an atomic struc...

show abstract

Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds

Cited by 69 publications

References 46 publications

Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4‐helix bundle proteins

Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4‐helix bundle proteins

Computational Evolution Experiments Reveal a Net Loss of Genetic Information Despite Selection

Navigating the protein fitness landscape with Gaussian processes

Contact Info

Product

Resources

About