Knowing how protein sequence maps to function (the "fitness landscape") is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.protein engineering | recombination | machine learning | experimental design | active learning I n the mapping of protein sequence to protein behavior, the phenotype can be envisioned as a surface, or landscape, over the high-dimensional space of possible sequences (1). This "fitness landscape" could describe how the protein contributes to organismal fitness, or it may represent a biophysical property, such as stability, enzyme activity, or ligand binding affinity. The structure of this surface describes the spectrum of possible phenotypes as well as the mutational accessibility among them and therefore strongly influences protein evolution. This surface is also the objective function for protein engineering, which seeks to identify protein sequences that are highly optimized for a given property or set of properties.Identifying such optimized sequences is extremely challenging for several reasons. First, the space of possible protein sequences is incomprehensibly large and will never be searched exhaustively by any means, naturally, in the laboratory, or computationally (2, 3). Second, within this vast space, functional proteins are extremely scarce, with estimates that range from a high of 1 in 10 11 to as little as 1 in 10 77 (4,5). Of the sequences that are functional, most have poor fitness and their numbers decrease exponentially with higher levels of fitness (6, 7). Thus, highly fit sequences are vanishingly rare and overwhelmed by nonfunctional and mediocre sequences.Computational protein engineering uses models of protein function to guide a search for optimized sequences. These models typically contain an atomic struc...