In order to correctly predict amino acid identities within natural proteins, protein language models (PLMs) must implicitly learn distributional constraints on protein sequences upheld over the course of evolution. As a consequence, the sequence and mutation-level likelihoods of such models form effective zero-shot predictors of mutations. Although various schemes have been proposed for exploiting the distributional knowledge captured by PLMs to enhance supervised fitness prediction and design, lack of head-to-head comparison across different prediction strategies and different classes of PLM has made it challenging to identify the best-performing methods, and to understand the factors contributing to performance. Here, we extend previously proposed ranking-based loss functions to adapt the likelihoods of family-based and masked protein language models, and demonstrate that the best configurations outperform state-of-the-art approaches based on frozen embeddings in the low-data setting. Furthermore, we propose ensembling strategies that exploit the strong dependence of the mutational distributions learned by PLMs on sequence context, showing that they can be used to guide efficient optimisation strategies over fitness landscapes.