We present an intuitive strategy for predicting the effect of sequence variation on splicing. In contrast to transcriptional elements, splicing elements appear to be strongly position dependent. We demonstrated that exonic binding of the normally intronic splicing factor, U2AF65, inhibits splicing. Reasoning that the positional distribution of a splicing element is a signature of its function, we developed a method for organizing all possible sequence motifs into clusters based on the genomic profile of their positional distribution around splice sites. Binding sites for serine/arginine rich (SR) proteins tended to be exonic whereas heterogeneous ribonucleoprotein (hnRNP) recognition elements were mostly intronic. In addition to the known elements, novel motifs were returned and validated. This method was also predictive of splicing mutations. A mutation in a motif creates a new motif that sometimes has a similar distribution shape to the original motif and sometimes has a different distribution. We created an intraallelic distance measure to capture this property and found that mutations that created large intraallelic distances disrupted splicing in vivo whereas mutations with small distances did not alter splicing. Analyzing the dataset of human disease alleles revealed known splicing mutants to have high intraallelic distances and suggested that 22% of disease alleles that were originally classified as missense mutations may also affect splicing. This category together with mutations in the canonical splicing signals suggest that approximately one third of all disease-causing mutations alter pre-mRNA splicing. S plicing is catalyzed by the spliceosome, a riboprotein complex that rivals the ribosome in size and complexity. The ribosome has a large and small subunit whose assembly on the mRNA substrate corresponds to a functional switch from initiation to elongation. The spliceosome is composed of five subunits that appear to exist in at least four different stable configurations and, like the ribosomal subunits, transition between different assembled states corresponding to different stages of function (1-3). Mass spectroscopy has identified at least 300 RNA and protein components in this catalytic complex and studies have demonstrated heterogeneity in spliceosomal complexes isolated from different splicing substrates (4-6). The spliceosomal components that recognize the basic cis-elements of the splicing process are known. How the spliceosome assembles and reorganizes on these elements is also fairly well understood. However, several computational analyses estimate that these basic splicing elements contain at most half the information necessary for splice site recognition (7,8). The remaining information lies outside these splice sites presumably as enhancers or silencers.This information required to specify splicing presents a considerable mutational target-estimates of the fraction of disease mutations that affect splicing range from 15% (9) to 62% (10). Transcript analysis of genotyped cell lines has dis...
RNA secondary structure plays an integral role in catalytic, ribosomal, small nuclear, micro, and transfer RNAs. Discovering a prevalent role for secondary structure in pre-mRNAs has proven more elusive. By utilizing a variety of computational and biochemical approaches, we present evidence for a class of nuclear introns that relies upon secondary structure for correct splicing. These introns are defined by simple repeat expansions of complementary AC and GT dimers that co-occur at opposite boundaries of an intron to form a bridging structure that enforces correct splice site pairing. Remarkably, this class of introns does not require U2AF2, a core component of the spliceosome, for its processing. Phylogenetic analysis suggests that this mechanism was present in the ancestral vertebrate lineage prior to the divergence of tetrapods from teleosts. While largely lost from land dwelling vertebrates, this class of introns is found in 10% of all zebrafish genes.[Supplemental material is available for this article.]RNA splicing is a process that removes an internal segment of RNA (i.e., the intron) and rejoins together the two flanking segments (exons). Distinct but evolutionarily related versions of this processing reaction are found in prokaryotes and eukaryotes in a variety of different contexts. In eukaryotes, the splicing of nuclear introns is catalyzed by a large riboprotein complex called the spliceosome (Matlin and Moore 2007). RNA encoded by genes in organelles and some bacterial genomes contain self-splicing group I and II introns which catalyze their own removal (Cech et al. 1981). A basic problem for all introns is the correct identification and pairing of the splice sites. In group I and II introns, this pairing function is performed by RNA secondary structure alone, whereas in spliceosomal introns, small nuclear ribonucleoproteins (snRNPs) recognize and pair together the correct 5 ′ splice site (5 ′ ss) and branchpoint site (BP). However, there are some examples where the pairing of sites is assisted by intramolecular secondary structure in the intron (Goguel and Rosbash 1993;Libri et al. 1995;Charpentier and Rosbash 1996;Howe and Ares 1997;Spingola et al. 1999). In addition, there are some fascinating examples of how secondary structures can regulate mutually exclusive alternative splicing (Warf and Berglund 2007;McManus and Graveley 2011): Several regions of the Dscam1 pre-mRNA undergo extensive alternative splicing. In one of these regions, an upstream "selector" sequence near exon 5 can select from an array of 48 complementary downstream "docking" sequences. Each "docking" sequence can potentially base-pair with the "selector" sequence, thereby bringing an alternate version of exon 6 to splice to exon 5 ( Secondary structure in RNA can be identified experimentally or computationally. There are currently around a thousand publicly available structures-53% determined by X-ray crystallography and 47% bysolution NMR (Bernstein et al. 1977). Therehavebeen a great many advances in computational approaches t...
The pluripotency control regions (PluCRs) are defined as genomic regions that are bound by POU5F1, SOX2, and NANOG in vivo. We utilized a high-throughput binding assay to record more than 270,000 different DNA/protein binding measurements along incrementally tiled windows of DNA within these PluCRs. This high-resolution binding map is then used to systematically define the context of POU factor binding, and reveals patterns of cooperativity and competition in the pluripotency network. The most prominent pattern is a pervasive binding competition between POU5F1 and the forkhead transcription factors. Like many transcription factors, POU5F1 is co-expressed with a paralog, POU2F1, that shares an apparently identical binding specificity. By analyzing thousands of binding measurements, we discover context effects that discriminate POU2F1 from POU5F1 binding. Proximal NANOG binding promotes POU5F1 binding, whereas nearby SOX2 binding favors POU2F1. We demonstrate by cross-species comparison and by chromatin immunoprecipitation (ChIP) that the contextual sequence determinants learned in vitro are sufficient to predict POU2F1 binding in vivo.
There are a variety of in vivo and in vitro methods to determine the genome-wide specificity of a particular trans-acting factor. However there is an inherent limitation to these candidate approaches. Most biological studies focus on the regulation of particular genes, which are bound by numerous unknown trans-acting factors. Therefore, most biological inquiries would be better addressed by a method that maps all trans-acting factors that bind particular regions rather than identifying all regions bound by a particular trans-acting factor. Here, we present a high-throughput binding assay that returns thousands of unbiased measurements of complex formation on nucleic acid. We applied this method to identify transcriptional complexes that form on DNA regions upstream of genes involved in pluripotency in embryonic stem cells (ES cells) before and after differentiation. The raw binding scores, motif analysis and expression data are used to computationally reconstruct remodeling events returning the identity of the transcription factor(s) most likely to comprise the complex. The most significant remodeling event during ES cell differentiation occurred upstream of the REST gene, a transcriptional repressor that blocks neurogenesis. We also demonstrate how this method can be used to discover RNA elements and discuss applications of screening polymorphisms for allelic differences in binding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.