Almost all RNAs can fold to form extensive base-paired secondary structures. Many of these structures then modulate numerous fundamental elements of gene expression. Deducing these structure-function relationships requires that it be possible to predict RNA secondary structures accurately. However, RNA secondary structure prediction for large RNAs, such that a single predicted structure for a single sequence reliably represents the correct structure, has remained an unsolved problem. Here, we demonstrate that quantitative, nucleotide-resolution information from a SHAPE experiment can be interpreted as a pseudo-free energy change term and used to determine RNA secondary structure with high accuracy. Free energy minimization, by using SHAPE pseudofree energies, in conjunction with nearest neighbor parameters, predicts the secondary structure of deproteinized Escherichia coli 16S rRNA (>1,300 nt) and a set of smaller RNAs (75-155 nt) with accuracies of up to 96 -100%, which are comparable to the best accuracies achievable by comparative sequence analysis.RNA secondary structure ͉ prediction ͉ ribosome ͉ pseudo-free energy ͉ dynamic programming E ssentially all RNA molecules, even those with seemingly random sequences, have the ability to form extensive internal base pairs (1-3). This internal structure has profound consequences for RNA function. At large scales, long RNAs fold to form complex regulatory motifs like those found in the 5Ј and 3Ј untranslated regions of mRNAs and viral genomes and in large structured RNAs like ribozymes (4). On small scales, the extent of local structure over regions spanning 10-50 nt modulates whether an RNA motif can function in translation initiation by the ribosome, is accessible for interaction with the splicing machinery, or binds small siRNAs and miRNAs (5-7).To understand these fundamental cellular processes, it must be possible to reliably establish the structure of an RNA based on a single sequence. Accurate RNA secondary structures reflecting a single biological state are essential to deduce structure-function relationships in the many RNAs (i) for which a structure cannot be inferred by comparative analysis, (ii) that switch between distinct base-paired conformations to carry out their biological function, or (iii) that are in the process of folding to a functional state.Two broad classes of approaches are used to score RNA secondary structure predictions for single sequences: empirical freeenergy parameters (7) and knowledge based (8-10). The current best-performing algorithms achieve a sensitivity (percentage of known base pairs predicted correctly) of 40-70% (8-12). Prediction accuracies are higher for shorter RNAs, for base pairs with low contact order (the number of nucleotides that separate the paired nucleotides), and when chemical modification information is used to constrain folding (11,12). Accuracies tend to be poor for longer RNAs, and there are important short RNAs for which the prediction sensitivity is zero (12, 13).
Results
Structure of Escherichia coli 16S...