BackgroundComplex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms.ResultsSAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution.ConclusionsSAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
SummaryThe precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
BackgroundIdiopathic chronic diarrhea (ICD) is a common cause of morbidity and mortality among juvenile rhesus macaques. Characterized by chronic inflammation of the colon and repeated bouts of diarrhea, ICD is largely unresponsive to medical interventions, including corticosteroid, antiparasitic, and antibiotic treatments. Although ICD is accompanied by large disruptions in the composition of the commensal gut microbiome, no single pathogen has been concretely identified as responsible for the onset and continuation of the disease.ResultsFecal samples were collected from 12 ICD-diagnosed macaques and 12 age- and sex-matched controls. RNA was extracted for metatranscriptomic analysis of organisms and functional annotations associated with the gut microbiome. Bacterial, fungal, archaeal, protozoan, and macaque (host) transcripts were simultaneously assessed. ICD-afflicted animals were characterized by increased expression of host-derived genes involved in inflammation and increased transcripts from bacterial pathogens such as Campylobacter and Helicobacter and the protozoan Trichomonas. Transcripts associated with known mucin-degrading organisms and mucin-degrading enzymes were elevated in the fecal microbiomes of ICD-afflicted animals. Assessment of colon sections using immunohistochemistry and of the host transcriptome suggests differential fucosylation of mucins between control and ICD-afflicted animals. Interrogation of the metatranscriptome for fucose utilization genes reveals possible mechanisms by which opportunists persist in ICD. Bacteroides sp. potentially cross-fed fucose to Haemophilus whereas Campylobacter expressed a mucosa-associated transcriptome with increased expression of adherence genes.ConclusionsThe simultaneous profiling of bacterial, fungal, archaeal, protozoan, and macaque transcripts from stool samples reveals that ICD of rhesus macaques is associated with increased gene expression by pathogens, increased mucin degradation, and altered fucose utilization. The data suggest that the ICD-afflicted host produces fucosylated mucins that are leveraged by potentially pathogenic microbes as a carbon source or as adhesion sites.Electronic supplementary materialThe online version of this article (10.1186/s40168-019-0664-z) contains supplementary material, which is available to authorized users.
BackgroundAlthough metatranscriptomics—the study of diverse microbial population activity based on RNA-seq data—is rapidly growing in popularity, there are limited options for biologists to analyze this type of data. Current approaches for processing metatranscriptomes rely on restricted databases and a dedicated computing cluster, or metagenome-based approaches that have not been fully evaluated for processing metatranscriptomic datasets. We created a new bioinformatics pipeline, designed specifically for metatranscriptome dataset analysis, which runs in conjunction with Metagenome-RAST (MG-RAST) servers. Designed for use by researchers with relatively little bioinformatics experience, SAMSA offers a breakdown of metatranscriptome transcription activity levels by organism or transcript function, and is fully open source. We used this new tool to evaluate best practices for sequencing stool metatranscriptomes.ResultsWorking with the MG-RAST annotation server, we constructed the Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) software package, a complete pipeline for the analysis of gut microbiome data. SAMSA can summarize and evaluate raw annotation results, identifying abundant species and significant functional differences between metatranscriptomes.Using pilot data and simulated subsets, we determined experimental requirements for fecal gut metatranscriptomes. Sequences need to be either long reads (longer than 100 bp) or joined paired-end reads. Each sample needs 40–50 million raw sequences, which can be expected to yield the 5–10 million annotated reads necessary for accurate abundance measures. We also demonstrated that ribosomal RNA depletion does not equally deplete ribosomes from all species within a sample, and remaining rRNA sequences should be discarded. Using publicly available metatranscriptome data in which rRNA was not depleted, we were able to demonstrate that overall organism transcriptional activity can be measured using mRNA counts. We were also able to detect significant differences between control and experimental groups in both organism transcriptional activity and specific cellular functions.ConclusionsBy making this new pipeline publicly available, we have created a powerful new tool for metatranscriptomics research, offering a new method for greater insight into the activity of diverse microbial communities. We further recommend that stool metatranscriptomes be ribodepleted and sequenced in a 100 bp paired end format with a minimum of 40 million reads per sample.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1270-8) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.