For 50 years the term 'gene' has been synonymous with regions of the genome encoding mRNAs that are translated into protein. However, recent genome-wide studies have shown that the human genome is pervasively transcribed and produces many thousands of regulatory non-protein-coding RNAs (ncRNAs), including microRNAs, small interfering RNAs, PIWI-interacting RNAs and various classes of long ncRNAs. It is now clear that these RNAs fulfil critical roles as transcriptional and post-transcriptional regulators and as guides of chromatin-modifying complexes. Here we review the biology of ncRNAs, focusing on the fundamental mechanisms by which ncRNAs facilitate normal development and physiology and, when dysfunctional, underpin disease. We also discuss evidence that intergenic regions associated with complex diseases express ncRNAs, as well as the potential use of ncRNAs as diagnostic markers and therapeutic targets. Taken together, these observations emphasize the need to move beyond the confines of protein-coding genes and highlight the fact that continued investigation of ncRNA biogenesis and function will be necessary for a comprehensive understanding of human disease.
There are two intriguing paradoxes in molecular biology--the inconsistent relationship between organismal complexity and (1) cellular DNA content and (2) the number of protein-coding genes--referred to as the C-value and G-value paradoxes, respectively. The C-value paradox may be largely explained by varying ploidy. The G-value paradox is more problematic, as the extent of protein coding sequence remains relatively static over a wide range of developmental complexity. We show by analysis of sequenced genomes that the relative amount of non-protein-coding sequence increases consistently with complexity. We also show that the distribution of introns in complex organisms is non-random. Genes composed of large amounts of intronic sequence are significantly overrepresented amongst genes that are highly expressed in the nervous system, and amongst genes downregulated in embryonic stem cells and cancers. We suggest that the informational paradox in complex organisms may be explained by the expansion of cis-acting regulatory elements and genes specifying trans-acting non-protein-coding RNAs.
Small nucleolar RNAs (snoRNAs) guide RNA modification and are localized in nucleoli and Cajal bodies in eukaryotic cells. Components of the RNA silencing pathway associate with these structures, and two recent reports have revealed that a human and a protozoan snoRNA can be processed into miRNA-like RNAs. Here we show that small RNAs with evolutionary conservation of size and position are derived from the vast majority of snoRNA loci in animals (human, mouse, chicken, fruit fly), Arabidopsis, and fission yeast. In animals, sno-derived RNAs (sdRNAs) from H/ACA snoRNAs are predominantly 20-24 nucleotides (nt) in length and originate from the 39 end. Those derived from C/D snoRNAs show a bimodal size distribution at ;17-19 nt and >27 nt and predominantly originate from the 59 end. SdRNAs are associated with AGO7 in Arabidopsis and Ago1 in fission yeast with characteristic 59 nucleotide biases and show altered expression patterns in fly loquacious and Dicer-2 and mouse Dicer1 and Dgcr8 mutants. These findings indicate that there is interplay between the RNA silencing and snoRNAmediated RNA processing systems, and that sdRNAs comprise a novel and ancient class of small RNAs in eukaryotes.
During the splicing reaction, the 59 intron end is joined to the branchpoint nucleotide, selecting the next exon to incorporate into the mature RNA and forming an intron lariat, which is excised. Despite a critical role in gene splicing, the locations and features of human splicing branchpoints are largely unknown. We use exoribonuclease digestion and targeted RNA-sequencing to enrich for sequences that traverse the lariat junction and, by split and inverted alignment, reveal the branchpoint. We identify 59,359 high-confidence human branchpoints in >10,000 genes, providing a first map of splicing branchpoints in the human genome. Branchpoints are predominantly adenosine, highly conserved, and closely distributed to the 39 splice site. Analysis of human branchpoints reveals numerous novel features, including distinct features of branchpoints for alternatively spliced exons and a family of conserved sequence motifs overlapping branchpoints we term B-boxes, which exhibit maximal nucleotide diversity while maintaining interactions with the keto-rich U2 snRNA. Different B-box motifs exhibit divergent usage in vertebrate lineages and associate with other splicing elements and distinct intron-exon architectures, suggesting integration within a broader regulatory splicing code. Lastly, although branchpoints are refractory to common mutational processes and genetic variation, mutations occurring at branchpoint nucleotides are enriched for disease associations.[Supplemental material is available for this article.]The majority of human genes are spliced, a process whereby introns are removed from the nascent RNA and the remaining exonic sequence joined together into a mature RNA transcript. In addition, alternative splicing generates complex networks of isoforms from human gene loci and plays a major role in shaping the diversity of the transcriptome (Kapranov et al. 2005;Gerstein et al. 2007;Djebali et al. 2012).Splicing occurs in the spliceosome, a large ribonucleoprotein complex that recognizes at least three genetic elements within each intron: the 59 splice site (59SS), the 39 splice site (39SS), and the branchpoint (Will and L€ uhrmann 2011). RNU2-1, the U2 spliceosomal RNA (snRNA) base pairs to the sequence surrounding the unpaired branchpoint nucleotide, which then undergoes transesterification with the 59 end of the intron to form a closed lariat structure. The spliceosome then scans for the downstream 39 splice site, which undergoes a second trans-esterification reaction to join together the two exon ends and excise the intron lariat (Fig.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.