Analysis of RNA-seq data often detects numerous ‘non-co-linear’ (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts). Here, we developed a new NCL-transcript-detecting method (‘NCLscan’), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript. With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. We showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. Our study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome.
Transcriptionally non-co-linear (NCL) transcripts can originate from trans-splicing (trans-spliced RNA; ‘tsRNA’) or cis-backsplicing (circular RNA; ‘circRNA’). While numerous circRNAs have been detected in various species, tsRNAs remain largely uninvestigated. Here, we utilize integrative transcriptome sequencing of poly(A)- and non-poly(A)-selected RNA-seq data from diverse human cell lines to distinguish between tsRNAs and circRNAs. We identified 24,498 NCL events and found that a considerable proportion (20–35%) of them arise from both tsRNAs and circRNAs, representing extensive alternative trans-splicing and cis-backsplicing in human cells. We show that sequence generalities of exon circularization are also observed in tsRNAs. Recapitulation of NCL RNAs further shows that inverted Alu repeats can simultaneously promote the formation of tsRNAs and circRNAs. However, tsRNAs and circRNAs exhibit quite different, or even opposite, expression patterns, in terms of correlation with the expression of their co-linear counterparts, expression breadth/abundance, transcript stability, and subcellular localization preference. These results indicate that tsRNAs and circRNAs may play different regulatory roles and analysis of NCL events should take the joint effects of different NCL-splicing types and joint effects of multiple NCL events into consideration. This study describes the first transcriptome-wide analysis of trans-splicing and cis-backsplicing, expanding our understanding of the complexity of the human transcriptome.
Adenosine-to-inosine (A-to-I) editing is widespread across the kingdom Metazoa. However, for the lack of comprehensive analysis in nonmodel animals, the evolutionary history of A-to-I editing remains largely unexplored. Here, we detect high-confidence editing sites using clustering and conservation strategies based on RNA sequencing data alone, without using single-nucleotide polymorphism information or genome sequencing data from the same sample. We thereby unveil the first evolutionary landscape of A-to-I editing maps across 20 metazoan species (from worm to human), providing unprecedented evidence on how the editing mechanism gradually expands its territory and increases its influence along the history of evolution. Our result revealed that highly clustered and conserved editing sites tended to have a higher editing level and a higher magnitude of the ADAR motif. The ratio of the frequencies of nonsynonymous editing to that of synonymous editing remarkably increased with increasing the conservation level of A-to-I editing. These results thus suggest potentially functional benefit of highly clustered and conserved editing sites. In addition, spatiotemporal dynamics analyses reveal a conserved enrichment of editing and ADAR expression in the central nervous system throughout more than 300 Myr of divergent evolution in complex animals and the comparability of editing patterns between invertebrates and between vertebrates during development. This study provides evolutionary and dynamic aspects of A-to-I editome across metazoan species, expanding this important but understudied class of nongenomically encoded events for comprehensive characterization.
Gene expression is largely regulated by DNA methylation, transcription factor (TF), and microRNA (miRNA) before, during, and after transcription, respectively. Although the evolutionary effects of TF/miRNA regulations have been widely studied, evolutionary analysis of simultaneously accounting for DNA methylation, TF, and miRNA regulations and whether promoter methylation and gene body (coding regions) methylation have different effects on the rate of gene evolution remain uninvestigated. Here, we compared human–macaque and human–mouse protein evolutionary rates against experimentally determined single base-resolution DNA methylation data, revealing that promoter methylation level is positively correlated with protein evolutionary rates but negatively correlated with TF/miRNA regulations, whereas the opposite was observed for gene body methylation level. Our results showed that the relative importance of these regulatory factors in determining the rate of mammalian protein evolution is as follows: Promoter methylation ≈ miRNA regulation > gene body methylation > TF regulation, and further indicated that promoter methylation and miRNA regulation have a significant dependent effect on protein evolutionary rates. Although the mechanisms underlying cooperation between DNA methylation and TFs/miRNAs in gene regulation remain unclear, our study helps to not only illuminate the impact of these regulatory factors on mammalian protein evolution but also their intricate interaction within gene regulatory networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.