Bacterial biocatalysts play a key role in our transition to a bio-based, post-petroleum economy. However, the discovery of new biocatalysts is currently limited by our ability to analyze genomic information and our capacity of functionally screening for desired activities. Here, we present a simple workflow that combines functional metaproteomics and metagenomics, which facilitates the unmediated and direct discovery of biocatalysts in environmental samples. To identify the entirety of lipolytic biocatalysts in a soil sample contaminated with used cooking oil, we detected all proteins active against a fluorogenic substrate in sample’s metaproteome using a 2D-gel zymogram. Enzymes’ primary structures were then deduced by tryptic in-gel digest and mass spectrometry of the active protein spots, searching against a metagenome database created from the same contaminated soil sample. We then expressed one of the novel biocatalysts heterologously in Escherichia coli and obtained proof of lipolytic activity.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-017-0247-9) contains supplementary material, which is available to authorized users.
The majority of protein sequence data published today is of metagenomic origin. However, our ability to assign functions to these sequences is often hampered by our general inability to cultivate the larger part of microbial species and the sheer amount of sequence data generated in these projects. Here we present a combination of bioinformatics, synthetic biology, and Escherichia coli genetics to discover biocatalysts in metagenomic datasets. We created a subset of the Global Ocean Sampling dataset, the largest metagenomic project published to date, by removing all proteins that matched Hidden Markov Models of known protein families from PFAM and TIGRFAM with high confidence (E-value > 10 −5 ). This essentially left us with proteins with low or no homology to known protein families, still encompassing ∼1.7 million different sequences. In this subset, we then identified protein families de novo with a Markov clustering algorithm. For each protein family, we defined a single representative based on its phylogenetic relationship to all other members in that family. This reduced the dataset to ∼17,000 representatives of protein families with more than 10 members. Based on conserved regions typical for lipases and esterases, we selected a representative gene from a family of 27 members for synthesis. This protein, when expressed in E. coli, showed lipolytic activity toward para-nitrophenyl (pNP) esters. The K m -value of the enzyme was 66.68 μM for pNP-butyrate and 68.08 μM for pNP-palmitate with k cat /K m values at 3.4 × 10 6 and 6.6 × 10 5 M −1 s −1 , respectively. Hydrolysis of model substrates showed enantiopreference for the R-form. Reactions yielded 43 and 61% enantiomeric excess of products with ibuprofen methyl ester and 2-phenylpropanoic acid ethyl ester, respectively. The enzyme retains 50% of its maximum activity at temperatures as low as 10 • C, its activity is enhanced in artificial seawater and buffers with higher salt concentrations with an optimum osmolarity of 3,890 mosmol/l.
BackgroundWith the development of Next Generation Sequencing technologies, the number of predicted proteins from entire (meta-) genomes has risen exponentially. While for some of these sequences protein functions can be inferred from homology, an experimental characterization is still a requirement for the determination of protein function. However, functional characterization of proteins cannot keep pace with our capabilities to generate more and more sequence data.ResultsHere, we present an approach to reduce the number of proteins from entire (meta-) genomes to a reasonably small number for further experimental characterization without loss of important information. About 6.1 million predicted proteins from the Global Ocean Sampling Expedition Metagenome project were distributed into classes based either on homology to existing hidden markov models (HMMs) of known families, or de novo by assessment of pairwise similarity. 5.1 million of these proteins could be classified in this way, yielding 18,437 families. For 4,129 protein families, which did not match existing HMMs from databases, we could create novel HMMs. For each family, we then selected a representative protein, which showed the closest homology to all other proteins in this family. We then selected representatives of four families based on their homology to known and well-characterized lipases. From these four synthesized genes, we could obtain the novel esterase/lipase GOS54, validating our approach.ConclusionsUsing an in silico approach, we were able improve the success rate of functional screening and make entire (meta-) genomes amenable for biochemical characterization.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1668-y) contains supplementary material, which is available to authorized users.
Environmental sequence data of microbial communities now makes up the majority of public genomic information. The assignment of a function to sequences from these metagenomic sources is challenging because organisms associated with the data are often uncharacterized and not cultivable. To overcome these challenges, we created a rationally designed expression library of metagenomic proteins covering the sequence space of the thioredoxin superfamily. This library of 100 individual proteins represents more than 22,000 thioredoxins found in the Global Ocean Sampling data set. We screened this library for the functional rescue of Escherichia coli mutants lacking the thioredoxin-type reductase (Δ trxA ), isomerase (Δ dsbC ), or oxidase (Δ dsbA ). We were able to assign functions to more than a quarter of our representative proteins. The in vivo function of a given representative could not be predicted by phylogenetic relation but did correlate with the predicted isoelectric surface potential of the protein. Selected proteins were then purified, and we determined their activity using a standard insulin reduction assay and measured their redox potential. An unexpected gel shift of protein E5 during the redox potential determination revealed a redox cycle distinct from that of typical thioredoxin-superfamily oxidoreductases. Instead of the intramolecular disulfide bond formation typical for thioredoxins, this protein forms an intermolecular disulfide between the attacking cysteines of two separate subunits during its catalytic cycle. Our functional metagenomic approach proved not only useful to assign in vivo functions to representatives of thousands of proteins but also uncovered a novel reaction mechanism in a seemingly well-known protein superfamily.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.