In studying the strength and specificity of interaction between members
of two protein families, key questions center on which pairs of
possible partners actually interact, how well they interact,
and why they interact while others do not. The advent of
large-scale experimental studies of interactions between members of a target
family and a diverse set of possible interaction partners offers the opportunity
to address these questions. We develop here a method, DgSpi
(Data-driven Graphical models of Specificity in Protein:protein Interactions),
for learning and using graphical models that explicitly represent the amino acid
basis for interaction specificity (why) and extend earlier
classification-oriented approaches (which) to predict the
ΔG of binding (how well). We
demonstrate the effectiveness of our approach in analyzing and predicting
interactions between a set of 82 PDZ recognition modules, against a panel of 217
possible peptide partners, based on data from MacBeath and colleagues. Our
predicted ΔG values are highly predictive of the
experimentally measured ones, reaching correlation coefficients of 0.69 in
10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation.
Furthermore, the model serves as a compact representation of amino acid
constraints underlying the interactions, enabling protein-level
ΔG predictions to be naturally understood in terms
of residue-level constraints. Finally, as a generative model,
DgSpi readily enables the design of new interacting
partners, and we demonstrate that designed ligands are novel and diverse.