Engineering proteins and enzymes with the desired functionality has broad applications in molecular biology, biotechnology, biomedical sciences, health, and medicine. The vastness of protein sequence space and all the possible proteins it represents can pose a considerable barrier for enzyme engineering campaigns through directed evolution and rational design. The nonlinear effects of coevolution between amino acids in protein sequences complicate this further. Data-driven models increasingly provide scientists with the computational tools to navigate through the largely undiscovered forest of protein variants and catch a glimpse of the rules and effects underlying the topology of sequence space. In this review, we outline a complete theoretical journey through the processes of protein engineering methods such as directed evolution and rational design and reflect on these strategies and data-driven hybrid strategies in the context of sequence space. We discuss crucial phenomena of residue coevolution, such as epistasis, and review the history of models created over the past decade, aiming to infer rules of protein evolution from data and use this knowledge to improve the prediction of the structure− function relationship of proteins. Data-driven models based on deep learning algorithms are among the most promising methods that can account for the nonlinear phenomena of sequence space to some degree. We also critically discuss the available models to predict evolutionary coupling and epistatic effects (classical and deep learning) in terms of their capabilities and limitations. Finally, we present our perspective on possible future directions for developing data-driven approaches and provide key orientation points and necessities for the future of the fast-evolving field of enzyme engineering.