Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

Handzlik, Joanna E.; Tastsoglou, Spyros; Vlachos, Ioannis S.; Hatzigeorgiou, Artemis G.

doi:10.1038/s41598-020-57495-9

Cited by 22 publications

(16 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Probabilistic approaches such as RSEM, Kallisto and Salmon statistically weight transcript or isoform candidates, and are more suitable for quantifying well-characterized transcriptomes [ 24 – 26 ]. In small-RNA quantification, algorithms consider neighboring patterns around each multi-mapping alignment [ 27 , 28 ]. Mmquant reports multi-mappers as merged gene counts [ 29 ], and GeneQC employs Machine Learning to provide the user with uncertainty estimates for ambiguous alignments [ 30 ].…”

Section: Introductionmentioning

confidence: 99%

MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts

et al. 2022

View full text Add to dashboard Cite

Background Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. Results Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. Conclusions MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount.

show abstract

Section: Introductionmentioning

confidence: 99%

MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The majority of such tools proved to be unusable for our current application due to input file size, read length, or small RNA type restrictions (e.g., see Table S1 ). Two contemporary tools that were amenable to our deep-sequenced input are the widely applied and well-established ShortStack [ 24 ] and the recently developed Manatee [ 25 ]. Compared to the output of these tools, DANSR supplies users with complete information to characterize and prioritize discovered small RNAs: every candidate cluster is reported with genomic range, strand, total number of mapped reads, number of uniquely mapped reads, number of reads shared with other clusters, and annotation group (annotated, unannotated, or low quality); annotated clusters additionally report the name and biotype of all candidate features, the categories of candidate features (small RNA, protein coding gene, pseudogene, and lncRNA), the best feature, and its associated Jaccard score.…”

Section: Resultsmentioning

confidence: 99%

DANSR: A Tool for the Detection of Annotated and Novel Small RNAs

Zhang

Eteleeb

Rozycki

et al. 2022

ncRNA

View full text Add to dashboard Cite

Existing small noncoding RNA analysis tools are optimized for processing short sequencing reads (17–35 nucleotides) to monitor microRNA expression. However, these strategies under-represent many biologically relevant classes of small noncoding RNAs in the 36–200 nucleotides length range (tRNAs, snoRNAs, etc.). To address this, we developed DANSR, a tool for the detection of annotated and novel small RNAs using sequencing reads with variable lengths (ranging from 17–200 nt). While DANSR is broadly applicable to any small RNA dataset, we applied it to a cohort of matched normal, primary, and distant metastatic colorectal cancer specimens to demonstrate its ability to quantify annotated small RNAs, discover novel genes, and calculate differential expression. DANSR is available as an open source tool.

show abstract

“…Manatee [ 136 ] is an algorithm for the quantification of sRNA classes. In contrast to many available sRNA analysis pipelines, Manatee rescues highly multimapping and unaligned reads based on available annotation and robust density information and is capable of identifying and quantifying expression from isomiRs and unannotated loci that could give rise to yet unknown sRNAs.…”

Section: Methods and Techniquesmentioning

confidence: 99%

Bioinformatics and Machine Learning Approaches to Understand the Regulation of Mobile Genetic Elements

Giassa

Alexiou

2021

Biology

View full text Add to dashboard Cite

Transposable elements (TEs, or mobile genetic elements, MGEs) are ubiquitous genetic elements that make up a substantial proportion of the genome of many species. The recent growing interest in understanding the evolution and function of TEs has revealed that TEs play a dual role in genome evolution, development, disease, and drug resistance. Cells regulate TE expression against uncontrolled activity that can lead to developmental defects and disease, using multiple strategies, such as DNA chemical modification, small RNA (sRNA) silencing, chromatin modification, as well as sequence-specific repressors. Advancements in bioinformatics and machine learning approaches are increasingly contributing to the analysis of the regulation mechanisms. A plethora of tools and machine learning approaches have been developed for prediction, annotation, and expression profiling of sRNAs, for methylation analysis of TEs, as well as for genome-wide methylation analysis through bisulfite sequencing data. In this review, we provide a guided overview of the bioinformatic and machine learning state of the art of fields closely associated with TE regulation and function.

show abstract

Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

Cited by 22 publications

References 34 publications

MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts

MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts

DANSR: A Tool for the Detection of Annotated and Novel Small RNAs

Bioinformatics and Machine Learning Approaches to Understand the Regulation of Mobile Genetic Elements

Contact Info

Product

Resources

About