Traditional epitranscriptomics relies on capturing a single RNA modification by antibody or chemical treatment, combined with short-read sequencing to identify its transcriptomic location. This approach is labor-intensive and may introduce experimental artifacts. Direct sequencing of native RNA using Oxford Nanopore Technologies (ONT) can allow for directly detecting the RNA base modifications, although these modifications might appear as sequencing errors. The percent Error of Specific Bases (%ESB) was higher for native RNA than unmodified RNA, which enabled the detection of ribonucleotide modification sites. Based on the %ESB differences, we developed a bioinformatic tool, epitranscriptional landscape inferring from glitches of ONT signals (ELIGOS), that is based on various types of synthetic modified RNA and applied to rRNA and mRNA. ELIGOS is able to accurately predict known classes of RNA methylation sites (AUC > 0.93) in rRNAs from Escherichiacoli, yeast, and human cells, using either unmodified in vitro transcription RNA or a background error model, which mimics the systematic error of direct RNA sequencing as the reference. The well-known DRACH/RRACH motif was localized and identified, consistent with previous studies, using differential analysis of ELIGOS to study the impact of RNA m6A methyltransferase by comparing wild type and knockouts in yeast and mouse cells. Lastly, the DRACH motif could also be identified in the mRNA of three human cell lines. The mRNA modification identified by ELIGOS is at the level of individual base resolution. In summary, we have developed a bioinformatic software package to uncover native RNA modifications.
In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species.
The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.
22Sequencing of native RNA and corresponding cDNA was performed using Oxford Nanopore 23Technology. The % Error of Specific Bases (%ESB) was higher for native RNA than for 24 cDNA, which enabled detection of ribonucleotide modification sites. Based on %ESB 25 differences of the two templates, a bioinformatic tool ELIGOS was developed and applied to 26 rRNAs of E. coli, yeast and human cells. ELIGOS captured 91%, 95%, ~75%, respectively, 27 of the known variety of RNA methylation sites in these rRNAs. Yeast transcriptomes from 28 different growth conditions were also compared, which identified an association between 29 metabolic adaptation and inferred RNA modifications. ELIGOS was further applied to human 30 transcriptome datasets, which identified the well-known DRACH motif containing N6-31 methyadenine being located close to 3'-untranslated regions of mRNA. Moreover, the RNA 32 G-quadruplex motif was uncovered by ELIGOS. In summary, we have developed an 33 experimental method coupled with bioinformatic software to uncover native RNA 34 modifications and secondary-structures within transcripts. 35 36 37 MAIN TEXT 38The transcriptome is the collection of all RNA molecules present in a given cell that can be 39 determined by high-throughput techniques, such as microarray analysis or RNA sequencing 40 (RNA-seq) methods 1 . RNA-seq using next-generation sequencing (NGS) techniques has 41 been replacing microarray analysis, since the former is able to detect novel or unknown 42 transcripts. Further, NGS enables transcriptome analysis with a higher dynamic range of 43 expression levels than for microarrays 2 . With improved sample preparation methods and 44 reduced sequencing costs, RNA-seq by NGS has become the method of choice to study 45 transcriptomes. 46The length of sequence reads generated with most NGS platforms range from 35 nt up 47 to about 500 nt, so that single reads rarely cover a complete transcript. Accurate alignment 48 and assembly of such short sequences depends on availability of a reference genome, and the 49 identification of spliced isoforms or gene-fusion transcripts remains a challenge 3 . Further, 50 methods depending on reverse transcription (RT) of RNA and amplification may introduce 51 biases and artifacts 4 . These shortcomings can be overcome by directly sequencing native 52RNA molecules using technologies such as the Oxford Nanopore Technologies (ONT) 53 platform. Direct RNA sequencing without amplification (dRNA-seq) is able to generate long 54 reads, typically covering the full length of a transcript 5 . The method can accurately quantify 55 transcripts in order to analyze differential gene expression with a dynamic range comparable 56 to traditional RNA-seq derived from short read sequencing, while it enables accurate 57 identification of the structure and boundaries of transcripts including spliced products 6 . 58An additional advantage of dRNA-seq is the detection of transcriptional modifications 59 inferred from the current signal as the RNA molecule passes a nanopore: modified RN...
The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.