pH
regulates protein structures and the associated functions in
many biological processes via protonation and deprotonation of ionizable
side chains where the titration equilibria are determined by pK
a’s. To accelerate pH-dependent molecular
mechanism research in the life sciences or industrial protein and
drug designs, fast and accurate pK
a prediction
is crucial. Here we present a theoretical pK
a data set PHMD549, which was successfully applied to four
distinct machine learning methods, including DeepKa, which was proposed
in our previous work. To reach a valid comparison, EXP67S was selected
as the test set. Encouragingly, DeepKa was improved significantly
and outperforms other state-of-the-art methods, except for the constant-pH
molecular dynamics, which was utilized to create PHMD549. More importantly,
DeepKa reproduced experimental pK
a orders
of acidic dyads in five enzyme catalytic sites. Apart from structural
proteins, DeepKa was found applicable to intrinsically disordered
peptides. Further, in combination with solvent exposures, it is revealed
that DeepKa offers the most accurate prediction under the challenging
circumstance that hydrogen bonding or salt bridge interaction is partly
compensated by desolvation for a buried side chain. Finally, our benchmark
data qualify PHMD549 and EXP67S as the basis for future developments
of protein pK
a prediction tools driven
by artificial intelligence. In addition, DeepKa built on PHMD549 has
been proven an efficient protein pK
a predictor
and thus can be applied immediately to, for example, pK
a database construction, protein design, drug discovery,
and so on.