We propose a di erential equation model for gene expression and provide two methods to construct the model from a set of temporal data. We model both transcription and translation by kinetic equations with feedback loops from translation products to transcription. Degradation of proteins and mRNAs is also incorporated. We study two methods to construct the model from experimental data: Minimum Weight Solutions to Linear Equations (MWSLE), which determines the regulation by solving under-determined linear equations, and Fourier Transform for Stable Systems (FTSS), which re nes the model with cell cycle constraints. The results suggest that a minor set of temporal data may be su cient t o c o nstruct the model at the genome level. We also give a comprehensive discussion of other extended models: the RNA Model, the Protein Model, and the Time Delay Model.
Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution. As demonstrated in two highly dissimilar proteins, avGFP and TEM-1 β -lactamase, top candidates from a single round are diverse and as active as engineered mutants obtained from previous multi-year, high-throughput efforts. Because it distills information from both global and local sequence landscapes, our model approximates protein function even before receiving experimental data, and generalizes from only single mutations to propose high-functioning epistatically non-trivial designs. With reproducible >500% improvements in activity from a single assay in a 96-well plate, we demonstrate the strongest generalization observed in machine-learning guided protein function optimization to date. Taken together, our approach enables efficient use of resource intensive high-fidelity assays without sacrificing throughput, and helps to accelerate engineered proteins into the fermenter, field, and clinic. 1.
Nature provides abundant examples of protein families with highly diverged sequences. The ability to design new protein homologs has many applications, yet synthetic approaches have been unable to generate similarly diverse protein sequences with functional activity in the lab [1, 2]. New technologies offer a solution: high-throughput DNA synthesis and sequencing technologies allow thousands of designed sequences to be assayed in parallel, enabling deep diversification guided by machine learning (ML) models that relate protein sequence to function without detailed biophysical or mechanistic modeling. Here we apply deep learning to design novel adeno-associated virus (AAV) capsid proteins, a challenging target of great utility for gene therapy. Focusing on a 28-amino acid segment spanning buried and exposed regions, we generated 201,426 highly diverse variants of the AAV2 wildtype (WT) sequence, yielding 110,689 viable synthetic capsids, 57,348 of which surpass the average diversity of natural AAV serotype sequences with 12-29 mutations across this region. Even when trained on limited data, deep neural network models accurately predicted capsid viability across highly diverse variants. Deep diversification enables the design of AAV capsids with completely synthetic sequences for the universal treatment of all patients regardless of prior exposure to natural AAV, while demonstrating a general approach that makes vast areas of functional but previously unreachable sequence space accessible.EK, PJO, NJ, SS, GMC performed research while at Harvard University and EK, SS also performed research while at Dyno Therapeutics. EK, SS, and GMC hold equity at Dyno Therapeutics. A full list of GMC's tech transfer, advisory roles, and funding sources can be found on the lab's website: http://arep.med.harvard.edu/gmc/tech.html . Harvard University has filed a provisional patent application for inventions related to this work. DHB, AB, LJC, PR performed research as part of their employment at Google LLC. Google is a technology company that sells machine learning services as part of its business. Data availabilityExperimental data for all 3 experiments will be deposited on a public repository (NCBI SRA ( https://www.ncbi.nlm.nih.gov/sra ) , id: SUB7629680) by publication date. Code availabilityThe TensorFlow 1.3 API was used to implement and train all models using the architectures described in Methods. The training and validation datasets used for creating each model are available as part of the experimental dataset released as described in the preceding section. The code required to construct the A 39 training data and also to synthesize, process, and analyze the experimental data is provided for download, together with ipython notebooks that reproduce the analysis figures from the main text.10 284 1 0.40%
We report here the identification of a previously unknown transcription regulatory element for heat shock (HS) genes in Caenorhabditis elegans. We monitored the expression pattern of 11,917 genes from C. elegans to determine the genes that were up-regulated on HS. Twenty eight genes were observed to be consistently up-regulated in several different repetitions of the experiments. We analyzed the upstream regions of these genes using computational DNA pattern recognition methods. Two potential cis-regulatory motifs were identified in this way. One of these motifs (TTCTAGAA) was the DNA binding motif for the heat shock factor (HSF), whereas the other (GGGTGTC) was previously unreported in the literature. We determined the significance of these motifs for the HS genes using different statistical tests and parameters. Comparative sequence analysis of orthologous HS genes from C. elegans and Caenorhabditis briggsae indicated that the identified DNA regulatory motifs are conserved across related species. The role of the identified DNA sites in regulation of HS genes was tested by in vitro mutagenesis of a green fluorescent protein (GFP) reporter transgene driven by the C. elegans hsp-16-2 promoter. DNA sites corresponding to both motifs are shown to play a significant role in up-regulation of the hsp-16-2 gene on HS. This is one of the rare instances in which a novel regulatory element, identified using computational methods, is shown to be biologically active. The contributions of individual sites toward induction of transcription on HS are nonadditive, which indicates interaction and cross-talk between the sites, possibly through the transcription factors (TFs) binding to these sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.