Central to Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-Cas systems are repeated RNA sequences that serve as Cas-protein–binding templates. Classification is based on the architectural composition of associated Cas proteins, considering repeat evolution is essential to complete the picture. We compiled the largest data set of CRISPRs to date, performed comprehensive, independent clustering analyses and identified a novel set of 40 conserved sequence families and 33 potential structure motifs for Cas-endoribonucleases with some distinct conservation patterns. Evolutionary relationships are presented as a hierarchical map of sequence and structure similarities for both a quick and detailed insight into the diversity of CRISPR-Cas systems. In a comparison with Cas-subtypes, I-C, I-E, I-F and type II were strongly coupled and the remaining type I and type III subtypes were loosely coupled to repeat and Cas1 evolution, respectively. Subtypes with a strong link to CRISPR evolution were almost exclusive to bacteria; nevertheless, we identified rare examples of potential horizontal transfer of I-C and I-E systems into archaeal organisms. Our easy-to-use web server provides an automated assignment of newly sequenced CRISPRs to our classification system and enables more informed choices on future hypotheses in CRISPR-Cas research: http://rna.informatik.uni-freiburg.de/CRISPRmap.
Large-scale RNA sequencing has revealed a large number of long mRNA-like transcripts (lncRNAs) that do not code for proteins. The evolutionary history of these lncRNAs has been notoriously hard to study systematically due to their low level of sequence conservation that precludes comprehensive homology-based surveys and makes them nearly impossible to align. An increasing number of special cases, however, has been shown to be at least as old as the vertebrate lineage. Here we use the conservation of splice sites to trace the evolution of lncRNAs. We show that >85% of the human GENCODE lncRNAs were already present at the divergence of placental mammals and many hundreds of these RNAs date back even further. Nevertheless, we observe a fast turnover of intron/exon structures. We conclude that lncRNA genes are evolutionary ancient components of vertebrate genomes that show an unexpected and unprecedented evolutionary plasticity. We offer a public web service (http://splicemap.bioinf.unileipzig.de) that allows to retrieve sets of orthologous splice sites and to produce overview maps of evolutionarily conserved splice sites for visualization and further analysis. An electronic supplement containing the ncRNA data sets used in this study is available at
Motivation: Clustering according to sequence–structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs.Results: We present a novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure. The approach scales to datasets of hundreds of thousands of sequences. The quality of the retrieved clusters has been benchmarked against known ncRNA datasets and is comparable to state-of-the-art sequence–structure methods although achieving speedups of several orders of magnitude. A selection of applications aiming at the detection of novel structural ncRNAs are presented. Exemplarily, we predicted local structural elements specific to lincRNAs likely functionally associating involved transcripts to vital processes of the human nervous system. In total, we predicted 349 local structural RNA elements.Availability: The pipeline is available on request.Contact: backofen@informatik.uni-freiburg.deSupplementary information: Supplementary data are available at Bioinformatics online.
Acute myeloid leukemia (AML) can be grouped into morphologically or genetically defined subtypes. Today, the AML phenotype-genotype associations, that is, FAB/WHO (French-American-British/World Health Organization) definitions and recurrent molecular mutations, are not fully understood. Therefore, we evaluated the impact of molecular mutations on the AML differentiation stage by molecular profiling of 4373 adult de novo AML patients in 7 cytomorphological subtypes. We investigated mutations in 20 genes, including myeloid transcription factors (CEBPA, RUNX1), tumor suppressors (TP53, WT1), DNA modifiers (DNMT3A, IDH1/2, TET2), chromatin modifiers (ASXL1, MLL), signal transduction genes (FLT3, KRAS, NRAS) and NPM1. The most frequently mutated genes per cytomorphological subtype were RUNX1 in M0 (43%), NPM1 in M1 (42%), DNMT3A in M2 (26%), NPM1 in M4 (57%), M5a (49%) and M5b (70%) and TP53 in M6 (36%). Although some gene mutations were frequent in several cytomorphological subtypes, a series of associations of co-occurring mutations with distinct phenotypes were identified for molecularly defined subcohorts. FLT3, NPM1 and WT1 mutations were associated with an immature phenotype in myeloblastic AML, whereas other combinations involving ASXL1, RUNX1, MLL-PTD, CEBPA or KRAS were more frequent in myeloblastic AML with maturation. Within the NPM1 mutated subcohort, ASXL1 mutations were significantly associated with a monoblastic differentiation and DNMT3A mutations with a monocytic phenotype.
Background: Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.