The accuracy of computer predictions of RNA secondary structure from sequence data and free energy parameters has been increased to roughly 70%. Performance is judged by comparison with structures known from phylogenetic analysis. The algorithm also generates suboptimal structures. On average, the best structure within 10% of the lowest free energy contains roughly 90% of phylogenetically known helixes. The algorithm does not include tertiary interactions or pseudoknots and employs a crude model for singlestranded regions. The only favorable interactions are base pairing and stacking of terminal unpaired nucleotides at the ends of helixes. The excellent performance is consistent with these interactions being the primary interactions determining RNA secondary structure.RNA is important for functions such as catalysis, RNA splicing, regulation of transcription and translation, and transport of proteins across membranes (1). Many RNA sequences are known. Determination of secondary structures, however, is difficult. Thermodynamics has been applied to predict RNA secondary structure from sequence (2, 3), but with modest success. Predictions of suboptimal structures make the method more useful (4). We report combining several recent advances to improve predictions of RNA secondary structures. Three advances are incorporated in this work. (i) New methods for synthesizing RNA make it possible to obtain model systems with a large variety of sequence (5, 6). This technique has led to measurements of improved parameters and the realization that non-base-paired nucleotides contribute sequence-dependent interactions that stabilize secondary structure (7, 8). (ii) A computer algorithm has been developed that allows incorporation of non-base-paired interactions in the prediction of optimal and suboptimal secondary structures (9). (iii) Several RNA secondary structures have been determined by phylogeny (10-15). Comparison of predicted and known structures allows optimization of parameters that have not been measured. Resultant predictions appear sufficiently reliable to aid planning and interpretation of experiments on RNA.
MATERIALS AND METHODSThermodynamic Parameters. When possible, free energy increments at 370C, AG037, were taken from experiments in 1 M NaCl. For fully base-paired regions, experiments on dGCATGC indicate that 1 M NaCl mimics solutions containing 1-100 mM Mg2+ in the presence of 0.15-1 M NaCl (16).Relatively few experiments are available for loop structures, and, therefore, little is known about interactions determining loop stability. This situation forced approximations. (i) Jacobson-Stockmayer theory (17) was used to extrapolate the length dependence of AG37 for bulge, hairpin, and internal loops: AG0(n) = AG(nmax) + 1.75 RTln (n/nmax).For this equation n is the number of unpaired nucleotides in the loop, nma is the maximum-length loop for which experimental data is available, R is the gas constant (1.987 cal mol'lK-l; 1 cal = 4.184 J), and T is the temperature in K (310.15 K for 370C). (ii) When...