Esterases receive special attention because of their wide distribution in biological systems and environments and their importance for physiology and chemical synthesis. The prediction of esterases' substrate promiscuity level from sequence data and the molecular reasons why certain such enzymes are more promiscuous than others remain to be elucidated. This limits the surveillance of the sequence space for esterases potentially leading to new versatile biocatalysts and new insights into their role in cellular function. Here, we performed an extensive analysis of the substrate spectra of 145 phylogenetically and environmentally diverse microbial esterases, when tested with 96 diverse esters. We determined the primary factors shaping their substrate range by analyzing substrate range patterns in combination with structural analysis and protein-ligand simulations. We found a structural parameter that helps rank (classify) the promiscuity level of esterases from sequence data at 94% accuracy. This parameter, the active site effective volume, exemplifies the topology of the catalytic environment by measuring the active site cavity volume corrected by the relative solvent accessible surface area (SASA) of the catalytic triad. Sequences encoding esterases with active site effective volumes (cavity volume/SASA) above a threshold show greater substrate spectra, which can be further extended in combination with phylogenetic data. This measure provides also a valuable tool for interrogating substrates capable of being converted. This measure, found to be transferred to phosphatases of the haloalkanoic acid dehalogenase superfamily and possibly other enzymatic systems, represents a powerful tool for low-cost bioprospecting for esterases with broad substrate ranges, in large scale sequence data sets.
The α/β‐hydrolase fold family is highly diverse in sequence, structure and biochemical function. To investigate the sequence–structure–function relationships, the Lipase Engineering Database (https://led.biocatnet.de) was updated. Overall, 280 638 protein sequences and 1557 protein structures were analysed. All α/β‐hydrolases consist of the catalytically active core domain, but they might also contain additional structural modules, resulting in 12 different architectures: core domain only, additional lids at three different positions, three different caps, additional N‐ or C‐terminal domains and combinations of N‐ and C‐terminal domains with caps and lids respectively. In addition, the α/β‐hydrolases were distinguished by their oxyanion hole signature (GX‐, GGGX‐ and Y‐types). The N‐terminal domains show two different folds, the Rossmann fold or the β‐propeller fold. The C‐terminal domains show a β‐sandwich fold. The N‐terminal β‐propeller domain and the C‐terminal β‐sandwich domain are structurally similar to carbohydrate‐binding proteins such as lectins. The classification was applied to the newly discovered polyethylene terephthalate (PET)‐degrading PETases and MHETases, which are core domain α/β‐hydrolases of the GX‐ and the GGGX‐type respectively. To investigate evolutionary relationships, sequence networks were analysed. The degree distribution followed a power law with a scaling exponent γ = 1.4, indicating a highly inhomogeneous network which consists of a few hubs and a large number of less connected sequences. The hub sequences have many functional neighbours and therefore are expected to be robust toward possible deleterious effects of mutations. The cluster size distribution followed a power law with an extrapolated scaling exponent τ = 2.6, which strongly supports the connectedness of the sequence space of α/β‐hydrolases. Database Supporting data about domains from other proteins with structural similarity to the N‐ or C‐terminal domains of α/β‐hydrolases are available in Data Repository of the University of Stuttgart (DaRUS) under doi: https://doi.org/10.18419/darus-458.
Petroleum-based plastics are durable and accumulate in all ecological niches. Knowledge on enzymatic degradation is sparse. Today, less than 50 verified plastics-active enzymes are known. First examples of enzymes acting on the polymers polyethylene terephthalate (PET) and polyurethane (PUR) have been reported together with a detailed biochemical and structural description. Furthermore, very few polyamide (PA) oligomer active enzymes are known. In this article, the current known enzymes acting on the synthetic polymers PET and PUR are briefly summarized, their published activity data were collected and integrated into a comprehensive open access database. The Plastics-Active Enzymes Database (PAZy) represents an inventory of known and experimentally verified enzymes that act on synthetic fossil fuel-based polymers. Almost 3000 homologs of PET-active enzymes were identified by profile hidden Markov models. Over 2000 homologs of PUR-active enzymes were identified by BLAST. Based on multiple sequence alignments, conservation analysis identified the most conserved amino acids, and sequence motifs for PET-and PUR-active enzymes were derived.
The Short‐chain Dehydrogenases/Reductases Engineering Database (SDRED) covers one of the largest known protein families (168 150 proteins). Assignment to the superfamilies of Classical and Extended SDRs was achieved by global sequence similarity and by identification of family‐specific sequence motifs. Two standard numbering schemes were established for Classical and Extended SDRs that allow for the determination of conserved amino acid residues, such as cofactor specificity determining positions or superfamily specific sequence motifs. The comprehensive sequence dataset of the SDRED facilitates the refinement of family‐specific sequence motifs. The glycine‐rich motifs for Classical and Extended SDRs were refined to improve the precision of superfamily classification. In each superfamily, the majority of sequences formed a tightly connected sequence network and belonged to a large homologous family. Despite their different sequence motifs and their different sequence length, the two sequence networks of Classical and Extended SDRs are not separate, but connected by edges at a threshold of 40% sequence similarity, indicating that all SDRs belong to a large, connected network. The SDRED is accessible at https://sdred.biocatnet.de/.
The development of novel enzymes for biocatalytic processes requires knowledge on substrate profile and selectivity; this can be derived from databases and from publications. Often, these sources lack time-course data for the substrate or product, and an unambiguous link between experiment and enzyme sequence. The lack of integrated, original data hampers the comprehensive analysis of enzyme kinetics and the evaluation of sequence-function relationships. In order to accelerate enzyme engineering, BioCatNet integrates protein sequence, protein structure, and experimental data for a given enzyme family. BioCatNet explicitly assigns the enzyme sequence to the experimental data, which consists of information on reaction conditions and time-course data. BioCatNet facilitates the consistent documentation of reaction conditions, the archiving of time-course data, and the efficient exchange of experimental data among collaborators. Data integration is demonstrated for three case studies by using the TEED (Thiamine diphosphate-dependent Enzymes Engineering Database).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.