Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.
Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.
There is a growing interest in developing cooperative chemoenzymatic reactions to harness the reactivity of chemical catalysts and the selectivity of enzymes for the synthesis of nonracemic chiral compounds. However, existing chemoenzymatic systems with more than one chemical reaction and one enzymatic reaction working cooperatively are rare. Moreover, the application of oxidoreductases in cooperative chemoenzymatic reactions is limited by the necessity of using expensive and unstable redox equivalents such as nicotinamide cofactors. Here, we report a light-driven cooperative chemoenzymatic system comprised of a photoinduced electron transfer reaction (PET) and a photosensitized energy transfer reaction (PEnT) with an enzymatic reduction in one-pot to synthesize chiral building blocks of bioactive compounds. As a proof of concept, ene-reductase was directly regenerated by PET in the absence of external cofactors. Meanwhile, enzymatic reduction worked cooperatively with photocatalyst-catalyzed energy transfer that continuously replenished the reactive isomer from the less reactive one. The whole system stereoconvergently reduced E/Z mixtures of alkenes to the enantiopure products. Additionally, enantioselective enzymatic reduction worked competitively with photocatalyst-catalyzed racemic background reaction and side reactions to channel the overall electron flow to the single enantiopure product. Such a light-driven cooperative chemoenzymatic system holds great potential for asymmetric synthesis using inexpensive petroleum or biomass-derived alkenes.
Protein engineering seeks to design proteins with improved or novel functions. Compared to rational design and directed evolution approaches, machine learning-guided approaches traverse the fitness landscape more effectively and hold the promise for accelerating engineering and reducing the experimental cost and effort. A critical challenge here is whether we are capable of predicting the function or fitness of unseen protein variants. By learning from the sequence and large-scale screening data of characterized variants, machine learning models predict functional fitness of sequences and prioritize new variants that are very likely to demonstrate enhanced functional properties, thereby guiding and accelerating rational design and directed evolution. While existing generative models and language models have been developed to predict the effects of mutation and assist protein engineering, the accuracy of these models is limited due to their unsupervised nature of the general sequence contexts they captured that is not specific to the protein being engineered. In this work, we propose ECNet, a deep-learning algorithm to exploit evolutionary contexts to predict functional fitness for protein engineering. Our method integrated local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest, as well as the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. This biologically motivated sequence modeling approach enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-orders. Through extensive benchmark experiments, we showed that our method outperforms existing methods on~50 deep mutagenesis scanning and random mutagenesis datasets, demonstrating its potential of guiding and expediting protein engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.