mRNA degradation is an important cellular mechanism involved in the control of gene expression. Several genome-wide profiling methods have been developed for detecting mRNA degradation in plants and animals. However, because many of these techniques use poly (A) mRNA for library preparation, degradation intermediates are often only detected near the 3′-ends of transcripts. Previously, we developed the Truncated RNA End Sequencing (TREseq) method using Arabidopsis thaliana, and demonstrated that this method ameliorates 3′-end bias. In analyses using TREseq, we observed G-rich sequences near the 5′-ends of degradation intermediates. However, this finding remained to be confirmed in other plant species. Hence, in this study, we conducted TREseq analyses in Lactuca sativa (lettuce), Oryza sativa (rice) and Rosa hybrida (rose). These species including A. thaliana were selected to encompass a diverse range in the angiosperm phylogeny. The results revealed similar sequence features near the 5′-ends of degradation intermediates, and involvement of translation process in all four species. In addition, homologous genes have similar efficiencies of mRNA degradation in different plants, suggesting that similar mechanisms of mRNA degradation are conserved across plant species. These strong sequence features were not observed in previous degradome analyses among different species in plants.
Background
RNA degradation is important for the regulation of gene expression. Despite the identification of proteins and sequences related to deadenylation-dependent RNA degradation in plants, endonucleolytic cleavage-dependent RNA degradation has not been studied in detail. Here, we developed truncated RNA end sequencing in Arabidopsis thaliana to identify cleavage sites and evaluate the efficiency of cleavage at each site. Although several features are related to RNA cleavage efficiency, the effect of each feature on cleavage efficiency has not been evaluated by considering multiple putative determinants in A. thaliana.
Results
Cleavage site information was acquired from a previous study, and cleavage efficiency at the site level (CSsite value), which indicates the number of reads at each cleavage site normalized to RNA abundance, was calculated. To identify features related to cleavage efficiency at the site level, multiple putative determinants (features) were used to perform feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. The results indicated that whole RNA features were important for the CSsite value, in addition to features around cleavage sites. Whole RNA features related to the translation process and nucleotide frequency around cleavage sites were major determinants of cleavage efficiency. The results were verified in a model constructed using only sequence features, which showed that the prediction accuracy was similar to that determined using all features including the translation process, suggesting that cleavage efficiency can be predicted using only sequence information. The LASSO regression model was validated in exogenous genes, which showed that the model constructed using only sequence information can predict cleavage efficiency in both endogenous and exogenous genes.
Conclusions
Feature selection using the LASSO regression model in A. thaliana identified 155 features. Correlation coefficients revealed that whole RNA features are important for determining cleavage efficiency in addition to features around the cleavage sites. The LASSO regression model can predict cleavage efficiency in endogenous and exogenous genes using only sequence information. The model revealed the significance of the effect of multiple determinants on cleavage efficiency, suggesting that sequence features are important for RNA degradation mechanisms in A. thaliana.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.