Within two years of their discovery in 1977, introns were found to have a positive effect on gene expression. Our result shows that introns can achieve gene expression and regulation through interaction with corresponding mRNA sequences. On the base of Smith-Waterman method, local comparing helps us get the optimal matched segments between intron sequences and mRNA sequences. Studying the distribution regulation of the optimal matching region on intron sequences of ribosomal protein genes about 27 species, we find that the intron length evolution processes beginning from 5' end to 3' end and increasing one by one structural unit, which comes up with a possible mechanism for the intron length evolution. The intron of structure units is conservative with about 60bp length, but the length of linker sequence between structure units changes a lot. Interestingly, distributions of the length and matching rate of optimal matched segments are consistent with sequence features of miRNA and siRNA. These results indicate that the interaction between intron sequences and mRNA sequences is a kind of functional RNA-RNA interaction. Meanwhile, the two kinds of sequences above are co-evolved and interactive to play their functions.
The 2-oxoglutarate/Fe (II)-dependent (2OG) oxygenase superfamily is mainly responsible for protein modification, nucleic acid repair and/or modification, and fatty acid metabolism and plays important roles in cancer, cardiovascular disease, and other diseases. They are likely to become new targets for the treatment of cancer and other diseases, so the accurate identification of 2OG oxygenases is of great significance. Many computational methods have been proposed to predict functional proteins to compensate for the time-consuming and expensive experimental identification. However, machine learning has not been applied to the study of 2OG oxygenases. In this study, we developed OGFE_RAAC, a prediction model to identify whether a protein is a 2OG oxygenase. To improve the performance of OGFE_RAAC, 673 amino acid reduction alphabets were used to determine the optimal feature representation scheme by recoding the protein sequence. The 10-fold cross-validation test showed that the accuracy of the model in identifying 2OG oxygenases is 91.04%. Besides, the independent dataset results also proved that the model has excellent generalization and robustness. It is expected to become an effective tool for the identification of 2OG oxygenases. With further research, we have also found that the function of 2OG oxygenases may be related to their polarity and hydrophobicity, which will help the follow-up study on the catalytic mechanism of 2OG oxygenases and the way they interact with the substrate. Based on the model we built, a user-friendly web server was established and can be friendly accessed at http://bioinfor.imu.edu.cn/ogferaac.
Introns, as important vectors of biological functions, can influence many stages of mRNA metabolism. However, in recent research, post-spliced introns are rarely considered. In this study, the optimal matched regions between introns and their mRNAs in nine model organism genomes were investigated with improved Smith–Waterman local alignment software. Our results showed that the distributions of mRNA optimal matched frequencies were highly consistent or universal. There are optimal matched frequency peaks in the UTR regions, which are obvious, especially in the 3′-UTR. The matched frequencies are relatively low in the CDS regions of the mRNA. The distributions of the optimal matched frequencies around the functional sites are also remarkably changed. The centers of the GC content distributions for different sequences are different. The matched rate distributions are highly consistent and are located mainly between 60% and 80%. The most probable value of the optimal matched segments is about 20 bp for lower eukaryotes and 30 bp for higher eukaryotes. These results show that there are abundant functional units in the introns, and these functional units are correlated structurally with all kinds of sequences of mRNA. The interaction between the post-spliced introns and their corresponding mRNAs may play a key role in gene expression.
Studies have shown that post-spliced introns promote cell survival when nutrients are scarce, and intron loss/gain can influence many stages of mRNA metabolism. However, few approaches are currently available to study the correlation between intron sequences and their corresponding mature mRNA sequences. Here, based on the results of the improved Smith-Waterman local alignment-based algorithm method (SW method) and binding free energy weighted local alignment algorithm method (BFE method), the optimal matched segments between introns and their corresponding mature mRNAs in Caenorhabditis elegans (C.elegans) and their relative matching frequency (RF) distributions were obtained. The results showed that although the distributions of relative matching frequencies on mRNAs obtained by the BFE method were similar to the SW method, the interaction intensity in 5’and 3’untranslated regions (UTRs) regions was weaker than the SW method. The RF distributions in the exon-exon junction regions were comparable, the effects of long and short introns on mRNA and on the five functional sites with BFE method were similar to the SW method. However, the interaction intensity in 5’and 3’UTR regions with BFE method was weaker than with SW method. Although the matching rate and length distribution shape of the optimal matched fragment were consistent with the SW method, an increase in length was observed. The matching rates and the length of the optimal matched fragments were mainly in the range of 60%–80% and 20-30bp, respectively. Although we found that there were still matching preferences in the 5’and 3’UTR regions of the mRNAs with BFE, the matching intensities were significantly lower than the matching intensities between introns and their corresponding mRNAs with SW method. Overall, our findings suggest that the interaction between introns and mRNAs results from synergism among different types of sequences during the evolutionary process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.