Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.
The methyltransferase complex (m6A writer), which catalyzes the deposition of N6-methyladenosine (m6A) in mRNAs, is highly conserved across most eukaryotic organisms, but its components and interactions between them are still far from fully understood. Here, using in vivo interaction proteomics, two HAKAI-interacting zinc finger proteins, HIZ1 and HIZ2, are discovered as components of the Arabidopsis m6A writer complex. HAKAI is required for the interaction between HIZ1 and MTA (mRNA adenosine methylase A). Whilst HIZ1 knockout plants have normal levels of m6A, plants in which it is overexpressed show reduced methylation and decreased lateral root formation. Mutant plants lacking HIZ2 are viable but have an 85% reduction in m6A abundance and show severe developmental defects. Our findings suggest that HIZ2 is likely the plant equivalent of ZC3H13 (Flacc) of the metazoan m6A-METTL Associated Complex.
21Understanding genome organization and gene regulation requires insight into RNA transcription, 22 processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a 23 wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA 24 methylation (m 6 A). Here we show that m 6 A can be mapped in full-length mRNAs transcriptome-wide 25 and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, 26 poly(A) site choice and poly(A) tail length. Loss of m 6 A from 3' untranslated regions is associated 27 with decreased relative transcript abundance and defective RNA 3′ end formation. A functional 28 consequence of disrupted m 6 A is a lengthening of the circadian period. We conclude that nanopore 29 direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length 30 single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying 31 this approach to less well-studied species could transform our understanding of what their genomes 32 encode. 33 34 misidentification of 3′ ends through internal priming 3 , spurious antisense and splicing events 46 produced by RT template switching 4,5 , and the inability to detect all base modifications in the 47 copying process 6 . The fragmentation of RNA prior to short-read sequencing makes it difficult to 48 interpret the combination of authentic RNA processing events and remains an unsolved problem 7 . 49We investigated whether long-read direct RNA sequencing (DRS) with nanopores 8 could 50 reveal the complexity of Arabidopsis mRNA processing and modifications. In nanopore DRS, the 51 protein pore (nanopore) sits in a membrane through which an electrical current is passed, and intact 52 RNA is fed through the nanopore by a motor protein 8 . Each RNA sequence within the nanopore 53 (5 bases) can be identified by the magnitude of signal it produces. Arabidopsis is a pathfinder model 54 in plant biology, and its genome annotation strongly influences the annotation and our 55 understanding of what other plant genomes encode. We applied nanopore DRS and Illumina RNAseq 56 to wild-type Arabidopsis (Col-0) and mutants defective in m 6 A 9 and exosome-mediated RNA decay 10 . 57We reveal m 6 A and combinations of RNA processing events (alternative patterns of 5′ capped 58 transcription start sites, splicing, 3′ polyadenylation and poly(A) tail length) in full-length Arabidopsis 59 mRNAs transcriptome-wide. 60 61 Results 62Nanopore DRS detects long, complex mRNAs and short, structured non-coding RNAs 63We purified poly (A)+ RNA from four biological replicates of 14-day-old Arabidopsis Col-0 seedlings. 64We incorporated synthetic External RNA Controls Consortium (ERCC) RNA Spike-In mixes into all 65 replicates 11,12 and carried out nanopore DRS. Illumina RNAseq was performed in parallel on similar 66 material. Using Guppy base-calling (Oxford Nanopore Technologies) and minimap2 alignment 67 software 13 , we identified around 1 mi...
Antisense transcription is known to have a range of impacts on sense gene expression, including (but not limited to) impeding transcription initiation, disrupting post-transcriptional processes, and enhancing, slowing, or even preventing transcription of the sense gene. Strand-specific RNA-Seq protocols preserve the strand information of the original RNA in the data, and so can be used to identify where antisense transcription may be implicated in regulating gene expression. However, our analysis of 199 strand-specific RNA-Seq experiments reveals that spurious antisense reads are often present in these datasets at levels greater than 1% of sense gene expression levels. Furthermore, these levels can vary substantially even between replicates in the same experiment, potentially disrupting any downstream analysis, if the incorrectly assigned antisense counts dominate the set of genes with high antisense transcription levels. Currently, no tools exist to detect or correct for this spurious antisense signal. Our tool, RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. It uses incorrectly spliced reads on the antisense strand and/or ERCC spikeins (if present in the data) to calculate both global and gene-specific antisense correction factors. We demonstrate the utility of our tool to filter out spurious antisense transcript counts in an Arabidopsis thaliana RNA-Seq experiment.RoSA is open source software available under the GPL Availability: licence via the Barton Group GitHub page https://github.com/bartongroup.
Motivation RNA-seq experiments are usually carried out in three or fewer replicates. In order to work well with so few samples, differential gene expression (DGE) tools typically assume the form of the underlying gene expression distribution. In this paper, the statistical properties of gene expression from RNA-seq are investigated in the complex eukaryote, Arabidopsis thaliana, extending and generalizing the results of previous work in the simple eukaryote Saccharomyces cerevisiae. Results We show that, consistent with the results in S.cerevisiae, more gene expression measurements in A.thaliana are consistent with being drawn from an underlying negative binomial distribution than either a log-normal distribution or a normal distribution, and that the size and complexity of the A.thaliana transcriptome does not influence the false positive rate performance of nine widely used DGE tools tested here. We therefore recommend the use of DGE tools that are based on the negative binomial distribution. Availability and implementation The raw data for the 17 WT Arabidopsis thaliana datasets is available from the European Nucleotide Archive (E-MTAB-5446). The processed and aligned data can be visualized in context using IGB (Freese et al., 2016), or downloaded directly, using our publicly available IGB quickload server at https://compbio.lifesci.dundee.ac.uk/arabidopsisQuickload/public_quickload/ under ‘RNAseq>Froussios2019’. All scripts and commands are available from github at https://github.com/bartongroup/KF_arabidopsis-GRNA. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.