Reverse transcription (RT) of RNA templates containing RNA modifications leads to synthesis of cDNA containing information on the modification in the form of misincorporation, arrest, or nucleotide skipping events. A compilation of such events from multiple cDNAs represents an RT-signature that is typical for a given modification, but, as we show here, depends also on the reverse transcriptase enzyme. A comparison of 13 different enzymes revealed a range of RT-signatures, with individual enzymes exhibiting average arrest rates between 20 and 75%, as well as average misincorporation rates between 30 and 75% in the read-through cDNA. Using RT-signatures from individual enzymes to train a random forest model as a machine learning regimen for prediction of modifications, we found strongly variegated success rates for the prediction of methylated purines, as exemplified with N1-methyladenosine (m1A). Among the 13 enzymes, a correlation was found between read length, misincorporation, and prediction success. Inversely, low average read length was correlated to high arrest rate and lower prediction success. The three most successful polymerases were then applied to the characterization of RT-signatures of other methylated purines. Guanosines featuring methyl groups on the Watson-Crick face were identified with high confidence, but discrimination between m1G and m22G was only partially successful. In summary, the results suggest that, given sufficient coverage and a set of specifically optimized reaction conditions for reverse transcription, all RNA modifications that impede Watson-Crick bonds can be distinguished by their RT-signature.
Methods for the detection of m6A by RNA-Seq technologies are increasingly sought after. We here present NOseq, a method to detect m6A residues in defined amplicons by virtue of their resistance to chemical deamination, effected by nitrous acid. Partial deamination in NOseq affects all exocyclic amino groups present in nucleobases and thus also changes sequence information. The method uses a mapping algorithm specifically adapted to the sequence degeneration caused by deamination events. Thus, m6A sites with partial modification levels of ∼50% were detected in defined amplicons, and this threshold can be lowered to ∼10% by combination with m6A immunoprecipitation. NOseq faithfully detected known m6A sites in human rRNA, and the long non-coding RNA MALAT1, and positively validated several m6A candidate sites, drawn from miCLIP data with an m6A antibody, in the transcriptome of Drosophila melanogaster. Conceptually related to bisulfite sequencing, NOseq presents a novel amplicon-based sequencing approach for the validation of m6A sites in defined sequences.
Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide resolution across the mapped transcriptome. Further downstream modules include tools for visualization, machine learning, and modification calling. From the machine-learning module, quality assessment parameters are provided to gauge the suitability of the initial dataset for effective machine learning and modification calling. This output is useful to improve the experimental parameters for library preparation and sequencing. In summary, the automation of the bioinformatics workflow allows a faster turnaround of the optimization cycles in modification calling.
Timely information on current infection numbers during an epidemic is of crucial importance for decision makers in politics, medicine, and businesses. As information about local infection risk can guide public policy as well as individual behavior, such as the wearing of personal protective equipment or voluntary social distancing, statistical models providing such insights should be transparent and reproducible as well as accurate. Fulfilling these requirements is drastically complicated by the large amounts of data generated during exponential growth of infection numbers, and by the complexity of common inference pipelines. Here, we present CorCast -- a stable and scalable distributed architecture for the reproducible estimation of nowcasts suitable for pandemic scenarios -- and its application to the inference of district-level SARS-CoV-2 infection numbers in Germany.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.