BackgroundProduction of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles.Principal FindingsTo identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well.ConclusionThe systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.
To exploit the huge potential of whole-genome sequence information, the ability to efficiently synthesize long, accurate DNA sequences is becoming increasingly important. An approach proposed toward this end involves the synthesis of Ϸ5-kb segments of DNA, followed by their assembly into longer sequences by conventional cloning methods [Smith, H. O., Hutchinson, C. A., III, Pfannkoch, C. & Venter, J. C. (2003) Proc. Natl. Acad. Sci. USA 100, 15440 -15445]. The major current impediment to the success of this tactic is the difficulty of building the Ϸ5-kb components accurately, efficiently, and rapidly from short synthetic oligonucleotide building blocks. We have developed and implemented a strategy for the high-throughput synthesis of long, accurate DNA sequences. Unpurified 40-base synthetic oligonucleotides are built into 500-to 800-bp ''synthons'' with low error frequency by automated PCRbased gene synthesis. By parallel processing, these synthons are efficiently joined into multisynthon Ϸ5-kb segments by using only three endonucleases and ''ligation by selection.'' These large segments can be subsequently assembled into very long sequences by conventional cloning. We validated the approach by building a synthetic 31,656-bp polyketide synthase gene cluster whose functionality was demonstrated by its ability to produce the megaenzyme and its polyketide product in Escherichia coli. The chemical synthesis of genes and genomes has received considerable attention for several decades and is becoming increasingly important in the exploitation of whole-genome sequence information. The field was pioneered by Khorana and coworkers with the then-heroic total synthesis of tRNA structural genes (1, 2) and by Itakura et al. (3) with the synthesis and expression of the somatostatin gene. Since then, DNA synthesis methodology has made steady progress, with current approaches relying on the enzyme-catalyzed assembly of short, chemically synthesized oligonucleotides. Of the various methods, polymerase cycling assembly (PCA) (4) is the most widely used because of its inherent simplicity. Overlapping, complementary oligonucleotides are annealed and recursively elongated with a heat-stable DNA polymerase to ultimately yield a full-length sequence, which is amplified by conventional PCR. PCA, first reported for synthesis of the 303-bp HIV-2 Rev gene (5), has since evolved (6-8) into a widely used general method for synthesis of genes of up to Ϸ1 kb.The 1-kb size barrier was broken in 1990 by Mandecki et al. (9), who synthesized a 2.1-kb plasmid by ligation of 30 fragments, and again in 1995 when Stemmer et al. (7) reported the one-step PCA synthesis of a 2.7-kb plasmid that was purified by antibiotic selection. Smith et al. (4) assembled the 5,386 X174 bacteriophage genome from a single pool of chemically synthesized oligonucleotides by using a combination of ligation and PCA methods, but purification of the product again required biological selection. In 2002, Cello et al. (10) described a stepwise synthesis of a 7,558-bp poliov...
The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.