Understanding the interactions that make up protein-protein or protein-peptide interfaces is a crucial step towards applications in biotechnology. The ability to discriminate between different partners defines the specificity of a binding protein and is equally important as its affinity to the target. Whereas many established computational methods provide an estimate of binding or non-binding, comparing similar ligands is still significantly more challenging. Here we evaluated the capability of predicting ligand binding specificity using three established but conceptually different physics-based methods for protein design. As a model system, we analyzed the binding of peptides to designed armadillo repeat proteins, where a single residue of the peptide was changed systematically, and compared the results with an experimental reference data set. The mutation of a single residue can have a strong impact on binding affinity and specificity, which is difficult to capture in sampling and scoring. We critically assessed the prediction accuracy of the computational methods and found that the prediction performance of each method is differently affected, suggesting the use of a complementary approach of the evaluated methods.Author SummaryProteins have to recognize other proteins and peptides in the cell with high specificity. To be able to predict such interactions with high precision would be immensely useful for medical and biotechnological applications. Here we tested three computational methods that use physics-based force fields on an experimental dataset and evaluated how well these predictions can be used to discriminate binding pockets on a single residue level. The predicted values of each method and the experimentally determined specificities correlated well, even though each approach had its biases. Therefore, we correlated the predictions with each other to complement the strengths and weaknesses of all approaches.