Covariation analysis is used to identify those positions with similar patterns of sequence variation in an alignment of RNA sequences. These constraints on the evolution of two positions are usually associated with a base pair in a helix. While mutual information (MI) has been used to accurately predict an RNA secondary structure and a few of its tertiary interactions, early studies revealed that phylogenetic event counting methods are more sensitive and provide extra confidence in the prediction of base pairs. We developed a novel and powerful phylogenetic events counting method (PEC) for quantifying positional covariation with the Gutell lab’s new R NA C omparative A nalysis D atabase (rCAD). The PEC and MI-based methods each identify unique base pairs, and jointly identify many other base pairs. In total, both methods in combination with an N-best and helix-extension strategy identify the maximal number of base pairs. While covariation methods have effectively and accurately predicted RNAs secondary structure, only a few tertiary structure base pairs have been identified. Analysis presented herein and at the Gutell lab’s Comparative RNA Web (CRW) Site reveal that the majority of these latter base pairs do not covary with one another. However, covariation analysis does reveal a weaker although significant covariation between sets of nucleotides that are in proximity in the three-dimensional RNA structure. This reveals that covariation analysis identifies other types of structural constraints beyond the two nucleotides that form a base pair.
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms are dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be accurately predicted from its sequence based on a limited set of energy parameters. The inter- and intra-molecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature influence the complex dynamics associated with a single stranded RNA's transitioning to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs three-dimensional structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequencies obtained from the comparative analysis of more than 50 000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energies were computed from the structural statistics for several datasets. While the statistical energies for base-pair stacks correlate with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energies for several structural elements were utilized in the Mfold RNA folding algorithm. The combined statistical energies for base-pair stacks, hairpins and internal loop flanks results in a significant improvement in the accuracy of secondary structure prediction; however, the hairpin flanks contribute the most.
RNA is directly associated with a growing number of functions within the cell. The accurate prediction of different RNAs higher-order structure from their nucleic acid sequences will provide insight into their functions and molecular mechanics. We have been determining statistical potentials for a collection of structural elements that is larger than the number of structural elements determined with experimentally determined energy values. The experimentally derived free-energies and the statistical potentials for canonical base pair stacks are analogous, demonstrating that statistical potentials derived from comparative data can be used as an alternative energetic parameter. A new computational infrastructure - RNA Comparative Analysis Database (rCAD) - that utilizes a relational database was developed to manipulate and analyze very large sequence alignments and secondary structure datasets. Using rCAD, a richer set of energetic parameters for RNA fundamental structural elements including hairpin and internal loops was determined. A new version of RNAfold was developed to utilize these statistical potentials. Overall, these new statistical potentials for hairpin and internal loops integrated into the new version of RNAfold demonstrated significant improvements in the prediction accuracy of RNA secondary structure.
Beyond its direct involvement in protein synthesis with mRNA, tRNA, and rRNA, RNA is now being appreciated for its significance in the overall metabolism and regulation of the cell. Comparative analysis has been very effective in the identification and characterization of RNA molecules, including the accurate prediction of their secondary structure. We are developing an integrative scalable data management and analysis system, the RNA Comparative Analysis Database (rCAD), implemented with SQL Server to support RNA comparative analysis. The platformagnostic database schema of rCAD captures the essential relationships between the different dimensions of information for RNA comparative analysis datasets. The rCAD implementation enables a variety of comparative analysis manipulations with multiple integrated data dimensions for advanced RNA comparative analysis workflows. In this paper, we describe details of the rCAD schema design and illustrate its usefulness with two usage scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.