“…At the same time, new classes of ML models should be developed for protein fitness prediction to take advantage of uncertainty and introduce helpful inductive biases for the domain. , There exist methods that take advantage of inductive biases and prior information about proteins, such as the assumption that most mutation effects are additive or incorporation of biophysical knowledge into models as priors. − Another method biases the search toward variants with fewer mutations, which are more likely to be stable and functional . Domain-specific self-supervision has been explored by training models on codons rather than amino acid sequences. ,, There are also efforts to utilize calibrated uncertainty about predicted fitnesses of proteins that lie out of the domain of previously screened proteins from the training set, but there is a need to expand and further test these methods in real settings. , It is still an open question whether supervised models can extrapolate beyond their training data to predict novel proteins. , More expressive deep learning methods, such as deep kernels, , could be explored as an alternative to Gaussian processes for uncertainty quantification in BO. Overall, there is significant potential to improve ML-based protein fitness prediction to help guide the search toward proteins with ideal fitness.…”