The identi®cation of protein sites undergoing correlated evolution (coevolution) is of great interest due to the possibility that these pairs will tend to be adjacent in the three-dimensional structure. Identi®cation of such pairs should provide useful information for understanding the evolutionary process, predicting the effects of site-directed substitution, and potentially for predicting protein structure. Here, we develop and apply a maximum likelihood method with the aim of improving detection of coevolution. Unlike previous methods which have had limited success, this method allows for correlations induced by phylogenetic relationships and for variation in rate of evolution along branches, and does not rely on accurate reconstruction of ancestral nodes. In order to reduce the complexity of coevolutionary relationships and identify the primary component of pairwise coevolution between two sites, we reduce the data to a two-state system at each site, regardless of the actual number of residues observed at that site. Simulations show that this strategy is good at identifying simple correlations and at recognizing cases in which the data are insuf®cient to distinguish between coevolution and spurious correlations. The new method was tested by using size and charge characteristics to group the residues at each site, and then evaluating coevolution in myoglobin sequences. Grouping based on physicochemical characteristics allows categorization of coevolving sites into positive and negative coevolution, depending on the correlation between equilibrium state frequencies. We detected a striking excess of negative coevolution (corresponding to charge) at sites brought into proximity by the periodicity of the a-helix, and there was also a tendency for sites with signi®cant likelihood ratios to be close in the three-dimensional structure. Sites on the surface of the protein appear to coevolve both when they are close in the structure, and when they are distant, implying a role for folding and/or avoidance of quaternary structure in the coevolution process.
# 1999 Academic PressKeywords: coevolution; protein residues; protein structure; maximum likelihood; molecular evolution
*Corresponding author
IntroductionThere has been a great deal of recent research on methods for detecting correlated changes in protein sequence evolution (Altschuh et al. 1987;Taylor & Hatrick, 1994;Gobel et al. 1994;Neher, 1994;Shindyalov et al., 1994;Pollock & Taylor, 1997;Pazos et al., 1997;Chelvanayagam et al., 1997). It is expected that the residues at some sites will strongly affect the evolution of certain other sites which are close in the three-dimensional structure of the protein. At such sites, a substitution which partly destabilizes the protein structure or function could be corrected by a subsequent (or simultaneous) substitution at an adjacent site. For example, a substitution involving reduction of volume in the protein core might cause a destabilizing pocket which only one or a few adjacent residues would be capable of ®lling with...