Engineering proteins and enzymes with the desired functionality has broad applications in molecular biology, biotechnology, biomedical sciences, health, and medicine. The vastness of protein sequence space and all the possible proteins it represents can pose a considerable barrier for enzyme engineering campaigns through directed evolution and rational design. The nonlinear effects of coevolution between amino acids in protein sequences complicate this further. Data-driven models increasingly provide scientists with the computational tools to navigate through the largely undiscovered forest of protein variants and catch a glimpse of the rules and effects underlying the topology of sequence space. In this review, we outline a complete theoretical journey through the processes of protein engineering methods such as directed evolution and rational design and reflect on these strategies and data-driven hybrid strategies in the context of sequence space. We discuss crucial phenomena of residue coevolution, such as epistasis, and review the history of models created over the past decade, aiming to infer rules of protein evolution from data and use this knowledge to improve the prediction of the structure− function relationship of proteins. Data-driven models based on deep learning algorithms are among the most promising methods that can account for the nonlinear phenomena of sequence space to some degree. We also critically discuss the available models to predict evolutionary coupling and epistatic effects (classical and deep learning) in terms of their capabilities and limitations. Finally, we present our perspective on possible future directions for developing data-driven approaches and provide key orientation points and necessities for the future of the fast-evolving field of enzyme engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.