Author contributions. WT designed research; MA and CP performed research; MA, CP and WT analyzed data; WT wrote original draft; WT, MA and CP reviewed and edited. MA and CP contributed equally to this work.
ABSTRACTHere, we investigate the contributions of coevolutive, evolutive and stochastic information in determining protein-protein interactions (PPIs) based on primary sequences of two interacting protein families A and B . Specifically, under the assumption that coevolutive information is imprinted on the interacting amino acids of two proteins in contrast to other (evolutive and stochastic) sources spread over their sequences, we dissect those contributions in terms of compensatory mutations at physically-coupled and uncoupled amino acids of A and B . We find that physically-coupled amino-acids at short range distances store the largest per-contact mutual information content, with a significant fraction of that content resulting from coevolutive sources alone. The information stored in coupled amino acids is shown further to discriminate multisequence alignments (MSAs) with the largest expectation fraction of PPI matches -a conclusion that holds against various definitions of intermolecular contacts and binding modes. When compared to the informational content resulting from evolution at long-range interactions, the mutual information in physically-coupled amino-acids is the strongest signal to distinguish PPIs derived from cospeciation and likely, the unique indication in case of molecular coevolution in independent genomes as the evolutive information must vanish for uncorrelated proteins.
SIGNIFICANCEThe problem of predicting protein-protein interactions (PPIs) based on multi-sequence alignments (MSAs) appears not completely resolved to date. In previous studies, one or more sources of information were taken into account not clarifying the isolated contributions of coevolutive, evolutive and stochastic information in resolving the problem. By benefiting from data sets made available in the sequence-and structure-rich era, we revisit the field to show that physically-coupled amino-acids of proteins store the largest (per contact) information content to discriminate MSAs with the largest expectation fraction of PPI matches -a result that should guide new developments in the field, aiming at characterizing protein interactions in general. While being selected to be thermodynamically stable and kinetically accessible in a particular fold (1, 2), interacting proteins A and B coevolve to maintain their bound free-energy stability against a vast repertoire of non-specific partners and interaction modes. Protein coevolution, in the form of a time-dependent molecular process, then translates itself into a series of primary-sequence variants of A and B encoding coordinated compensatory mutations (3) and, therefore, specific protein-protein interactions (PPIs) derived from this stability-driven process (4). As a ubiquitous process in molecular biology, coevolution thus apply to protein interologs, either paralogous ...