The medically relevant field of protein-based therapeutics
has
triggered a demand for protein engineering in different pH environments
of biological relevance. In silico engineering workflows
typically employ high-throughput screening campaigns that require
evaluating large sets of protein residues and point mutations by fast
yet accurate computational algorithms. While several high-throughput
pK
a prediction methods exist, their accuracies
are unclear due to the lack of a current comprehensive benchmarking.
Here, seven fast, efficient, and accessible approaches including PROPKA3,
DeepKa, PKAI, PKAI+, DelPhiPKa, MCCE2, and H++ were systematically
tested on a nonredundant subset of 408 measured protein residue pK
a shifts from the pK
a database (PKAD). While no method outperformed the null hypotheses
with confidence, as illustrated by statistical bootstrapping, DeepKa,
PKAI+, PROPKA3, and H++ had utility. More specifically, DeepKa consistently
performed well in tests across multiple and individual amino acid
residue types, as reflected by lower errors, higher correlations,
and improved classifications. Arithmetic averaging of the best empirical
predictors into simple consensuses improved overall transferability
and accuracy up to a root-mean-square error of 0.76 pK
a units and a correlation coefficient (R
2) of 0.45 to experimental pK
a shifts. This analysis should provide a basis for further methodological
developments and guide future applications, which require embedding
of computationally inexpensive pK
a prediction
methods, such as the optimization of antibodies for pH-dependent antigen
binding.