SynopsisI t is demonstrated that protein a-helix content can be predicted from an autocorrelation analysis of the protein hydrophobicity sequence. The Fourier transform of the autocorrelation function yields the spectral densities or weights of the various frequencies contributing to the autocorrelation function. Using sequence and secondary structure data from more than 160 proteins and domains, a linear relationship was found between spectral density a t periodicity 3.7 and protein a-helix content ( r = 0.83). This relation permits prediction of the helix content ( x ) of proteins of known sequence to within +15%, i.e., as ( x k 15)%. Predictions based on the autocorrelation procedure are compared with values obtained by other methods.
INTRODUCTIONEver since Linderstrom-Lang and Schellman' first recognized structural levels of organization within a protein, there have been many attempts to understand how protein secondary and tertiary structure is specified by the primary amino acid sequence. Some of these procedures have employed a statistical a p p r~a c h ,~-~ some a pattern-recognition a p p r~a c h ,~.~ others an information theoretic approach,'-'' and others an energy minimization approach.''?l2 One feature most of these procedures have in common is a description of the information transfer from sequence to structure in terms of the 20 commonly occurring amino acids.Am alternative approach was taken by Zimmerman et al. 13 Instead of considering structure from the point of view of the occurrence of each of the amino acids along the chain, they replaced each amino acid in a sequence with a quantitative measure of one or more of that amino acid's physical properties. Such properties were selected for their likely significance in establishing secondary and tertiary structure within the folded protein. The resulting linear series of numbers was treated as a time series and analyzed by autocorrelation methods. The essential feature of this approach is that patterns of variation of amino acid properties are observed rather than frequencies of occurrence of amino acids themselves.Jones,'* Kubota et al.,15 and most recently, Macchiato et a1.I6 have since utilized this approach, each group varying only in the properties of the amino acids included in the analysis. The original proponents, Zimmerman et al.,I3 considered that the most significant properties would be bulkiness, polarity, R , ranking in paper chromatography, acidity through the indices p1 and pK, and lastly, hydrophobicity based on solubility data of T a n f~r d . '~ Jones14 used Biopolymers, Vol. 27 451-477 (1988)
452HORNE the revised hydrophobicity scale of Nozaki and Tanford," deleted the isoionic point scale from the Zimmerman list, and included several geometric parameters of the amino acids peculiar to that particular amino acid in its particular position in an individual sequence. Kubota et al.15 described the amino acids using 10 indices, none of them geometric. The most important new additions were the propensities to form a a-helix or P-...