De novo protein design for catalysis of any desired chemical reaction is a long standing goal in protein engineering, due to the broad spectrum of technological, scientific and medical applications. Currently, mapping protein sequence to protein function is, however, neither computationionally nor experimentally tangible 1,2 . Here we developed ProteinGAN, a specialised variant of the generative adversarial network 3 that is able to 'learn' natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase as a template enzyme, we show that 24% of the ProteinGAN-generated and experimentally tested sequences are soluble and display wild-type level catalytic activity in the tested conditions in vitro , even in highly mutated (>100 mutations) sequences.ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse novel functional proteins within the allowed biological constraints of the sequence space.
Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
The molecular basis of how temperature affects cell metabolism has been a long-standing question in biology, where the main obstacles are the lack of high-quality data and methods to associate temperature effects on the function of individual proteins as well as to combine them at a systems level. Here we develop and apply a Bayesian modeling approach to resolve the temperature effects in genome scale metabolic models (GEM). The approach minimizes uncertainties in enzymatic thermal parameters and greatly improves the predictive strength of the GEMs. The resulting temperature constrained yeast GEM uncovers enzymes that limit growth at superoptimal temperatures, and squalene epoxidase (ERG1) is predicted to be the most rate limiting. By replacing this single key enzyme with an ortholog from a thermotolerant yeast strain, we obtain a thermotolerant strain that outgrows the wild type, demonstrating the critical role of sterol metabolism in yeast thermosensitivity. Therefore, apart from identifying thermal determinants of cell metabolism and enabling the design of thermotolerant strains, our Bayesian GEM approach facilitates modelling of complex biological systems in the absence of high-quality data and therefore shows promise for becoming a standard tool for genome scale modeling.
Horizontal gene transfer via plasmid conjugation enables antimicrobial resistance (AMR) to spread among bacteria and is a major health concern. The range of potential transfer hosts of a particular conjugative plasmid is characterised by its mobility (MOB) group, which is currently determined based on the amino acid sequence of the plasmid-encoded relaxase. To facilitate prediction of plasmid MOB groups, we have developed a bioinformatic procedure based on analysis of the origin-of-transfer (oriT), a merely 230 bp long non-coding plasmid DNA region that is the enzymatic substrate for the relaxase. By computationally interpreting conformational and physicochemical properties of the oriT region, which facilitate relaxase-oriT recognition and initiation of nicking, MOB groups can be resolved with over 99% accuracy. We have shown that oriT structural properties are highly conserved and can be used to discriminate among MOB groups more efficiently than the oriT nucleotide sequence. The procedure for prediction of MOB groups and potential transfer range of plasmids was implemented using published data and is available at http://dnatools.eu/MOB/plasmid.html.Antimicrobial resistance (AMR) is a pressing global issue, as it diminishes the activity of 29 antibiotics and consequently leads to over 25,000 deaths each year in Europe alone 1,2 . The development of AMR in microbial communities is facilitated by horizontal gene transfer (HGT) of conjugative elements (including plasmids and integrative elements) 3 carrying antibiotic resistance genes along with virulence genes 4,5 . It is therefore important to determine the routes of plasmid transfer among bacteria 6,7 , based on determining their host range 8 . It is currently known that each of the 6 established mobility superclasses of conjugative elements have limited transfer host range 8 . Conjugation systems of each of these MOB groups are classified according to the conservation of the amino acid sequences of relaxase, the central enzyme that enables relaxation and transfer of elements from donor to recipient cells 9,10 . Besides relaxases, the relative conservative nature of MOB groups can be detected among other protein components of conjugation systems, which are comprised of (i) auxiliary proteins that take part in formation of the relaxation complex (relaxosome) in the origin of transfer (oriT) DNA region 11 , (ii) coupling protein (type IV) 12,13 , which connects the relaxosome with (iii) the mating complex (type IV secretion system, T4SS) that forms the transfer channel between donor and recipient cells 14 . These protein components were shown to coevolve to a large extent within their respective MOB groups 12,13,15 . In addition to the conservative nature of proteins involved in DNA transfer, it has also been observed that a relaxase from a certain MOB group enables the most efficient transfer only of plasmids belonging to that same group 16 . Therefore, one can expect that the substrate for relaxases, the bare noncoding sites in oriT, should also possess some...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.