Hydrolases are a critical component for modern chemical, pharmaceutical, and environmental sciences. Identifying mutations that enhance catalytic efficiency presents a roadblock to design and to discover new hydrolases for broad academic and industrial uses. Here, we report the statistical profiling for rateperturbing mutant hydrolases with a single amino acid substitution. We constructed an integrated structure−kinetics database for hydrolases, IntEnzyDB, which contains 3907 k cat s, 4175 K M s, and 2715 Protein Data Bank IDs. IntEnzyDB adopts a relational architecture with a flattened data structure, enabling facile and efficient access to clean and tabulated data for machine learning uses. We conducted statistical analyses on how single amino acids mutations influence the turnover number (i.e., k cat ) and efficiency (i.e., k cat /K M ), with a particular emphasis on profiling the features for rate-enhancing mutations. The results show that mutation to bulky nonpolar residues with a hydrocarbon chain involves a higher likelihood for rate acceleration than to other types of residues. Linear regression models reveal geometric descriptors of substrate and mutation residues that mediate rate-perturbing outcomes for hydrolases with bulky nonpolar mutations. On the basis of the analyses of the structure−kinetics relationship, we observe that the propensity for rate enhancement is independent of protein sizes. In addition, we observe that distal mutations (i.e., >10 Å from the active site) in hydrolases are significantly more prone to induce efficiency neutrality and avoid efficiency deletion but involve similar propensity for rate enhancement. The studies reveal the statistical features for identifying rate-enhancing mutations in hydrolases, which will potentially guide hydrolase discovery in biocatalysis.
Identifying function-enhancing enzyme variants is a ‘holy grail’ challenge in protein science because it will allow researchers to expand the biocatalytic toolbox for late-stage functionalization of drug-like molecules, environmental degradation of plastics and other pollutants, and medical treatment of food allergies. Data-driven strategies, including statistical modeling, machine learning, and deep learning, have largely advanced the understanding the sequence-structure–function relationships for enzymes. They have also enhanced the capability of predicting and designing new enzymes and enzyme variants for catalyzing the transformation of new-to-nature reactions. Here, we reviewed recent progresses of data-driven models that can be applied in identifying efficiency-enhancing mutants for catalytic reactions. We also discussed existing challenges and obstacles faced by the community. Although the review is by no means comprehensive, we hope the discussion can inform the readers about the state-of-the-art in data-driven enzyme engineering, inspiring more joint experimental-computational efforts to develop and apply data-driven modeling to innovate biocatalysts for synthetic and pharmaceutical applications.
Data-driven modeling has emerged as a new paradigm for biocatalyst design and discovery. Biocatalytic databases that integrate enzyme structure and function data are in urgent need. Here we describe IntEnzyDB as an integrated structure–kinetics database for facile statistical modeling and machine learning. IntEnzyDB employs a relational database architecture with a flattened data structure, which allows rapid data operation. This architecture also makes it easy for IntEnzyDB to incorporate more types of enzyme function data. IntEnzyDB contains enzyme kinetics and structure data from six enzyme commission classes. Using 1050 enzyme structure–kinetics pairs, we investigated the efficiency-perturbing propensities of mutations that are close or distal to the active site. The statistical results show that efficiency-enhancing mutations are globally encoded and that deleterious mutations are much more likely to occur in close mutations than in distal mutations. Finally, we describe a web interface that allows public users to access enzymology data stored in IntEnzyDB. IntEnzyDB will provide a computational facility for data-driven modeling in biocatalysis and molecular evolution.
The Burkholderia cepacia complex (Bcc) is a group of bacteria including opportunistic human pathogens. Immunocompromised individuals and cystic fibrosis patients are especially vulnerable to serious infections by these bacteria, motivating the search for compounds with antimicrobial activity against the Bcc. Ubonodin is a lasso peptide with promising activity against Bcc species, working by inhibiting RNA polymerase in susceptible bacteria. We constructed a library of over 90 000 ubonodin variants with 2 amino acid substitutions and used a high-throughput screen and next-generation sequencing to examine the fitness of the entire library, generating the most comprehensive data set on lasso peptide activity so far. This screen revealed information regarding the structure–activity relationship of ubonodin over a large sequence space. Remarkably, the screen identified one variant with not only improved activity compared to wild-type ubonodin but also a submicromolar minimum inhibitory concentration (MIC) against a clinical isolate of the Bcc member Burkholderia cenocepacia. Ubonodin and several of the variants identified in this study had lower MICs against certain Bcc strains than those of many clinically approved antibiotics. Finally, the large library size enabled us to develop DeepLasso, a deep learning model that can predict the RNAP inhibitory activity of an ubonodin variant.
Reaction dynamics trajectory simulations have been conducted to predict the product ratio of reactions with post-transition state bifurcation. However, it remains unknown how the entropy of reactive species along the reaction path mediates ambimodal selectivity. Here, by leveraging deep generative model, we developed an accelerated entropic path sampling approach that evaluates the change of entropy along the post-transition-state reaction path for each product using merely a few hundred reaction dynamic trajectories. The new method, called bidirectional generative adversarial network - entropic path sampling (BGAN-EPS), can enhance the estimation of probability density functions of molecular configurations by generating pseudo-molecular configurations that are statistically indistinguishable from the true data. The method was tested using cyclopentadiene dimerization as a model reaction, in which we reproduced the reference entropic profiles (derived from 2,480 trajectories) using merely 124 trajectories. We further applied BGAN-EPS method to NgnD-catalyzed Diels–Alder reaction to investigate the entropic origin behind its ambimodal selectivity. The results show that the ambimodal preference towards the formation of the [6+4]-adduct over the [4+2]-adduct is contributed by both energetic and entropic forces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.