Improved taxonomic assignment of rumen bacterial 16S rRNA sequences using a revised SILVA taxonomic framework

Henderson, Gemma; Yilmaz, Pelin; Kumar, Shiv; Forster, Robert J.; Kelly, William J.; Leahy, Sinead C.; Guan, Le Luo; Janssen, Peter H.

doi:10.7717/peerj.6496

Cited by 81 publications

(72 citation statements)

References 47 publications

(71 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…16S rRNA gene sequencing, WGS and the reference-based RE-RRS approach all require a reference database to assign taxonomic information to the sequences. 16S rRNA gene reference databases tend to be more comprehensive because only a single gene needs to be sequenced, enabling both culturable and unculturable microbes to be captured (10). WGS and the reference-based RE-RRS approach both need reference databases containing genome assemblies.…”

Section: Resultsmentioning

confidence: 99%

“…Historically, there have been two approaches used for sequencing metagenome samples: targeted sequencing and whole genome shotgun (WGS) sequencing. Targeted sequencing amplifies specified phylogenetically informative genes from a sample, such as the 16S rRNA gene (16S) of microbes, which typically distinguishes taxonomic groups well due to large, comprehensive databases of 16S rRNA sequences that include both culturable and unculturable organisms (9, 10). This approach usually relies on having long sequence reads (11), only captures phylogenetic variation at one gene, and is subject to PCR primer bias due to mismatches in the flanking regions where the primers bind (12).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

Hess

Rowe

Stijn

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

AbstractMicrobial community profiles have been associated with a variety of traits, including methane emissions in livestock, however, these profiles can be difficult and expensive to obtain for thousands of samples. The objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (reference-based and reference-free) were compared to 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n=236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to sequencing. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of host genetics, laboratory and analytical method, and the genetic and phenotypic correlations with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach via a correspondence analysis, with estimates for repeatability of 0.62±0.06, heritability 0.31±0.29, and genetic and phenotypic correlation with methane emissions of 0.88±0.25 and 0.64±0.05 respectively for the first component of correspondence analysis. The reference-free approach assigned 62.0±5.7% of reads to common 65 bp tags, much higher than the reference-based approach of 6.8±1.8% of reads assigned. Sensitivity studies suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, therefore the current work of 118 samples/lane and future proposed 384 samples/lane are well within that threshold. Our approach is now being used to investigate host factors affecting the rumen and its association with a variety of production and environmental traits. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

Hess

Rowe

Stijn

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Historically, there have been two approaches used for sequencing metagenome samples: targeted sequencing and metagenome shotgun sequencing. Targeted sequencing amplifies specified phylogenetically informative genes from a sample, such as the 16S rRNA gene (16S) of microbes, which typically distinguishes taxonomic groups well due to large, comprehensive databases of 16S rRNA sequences that include both culturable and uncultured organisms [10,11]. This approach usually relies on having long sequence reads [12], only captures phylogenetic variation at one gene, and is subject to PCR primer bias due to mismatches in the flanking regions where the primers bind [13].…”

Section: Introductionmentioning

confidence: 99%

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

et al. 2020

Self Cite

View full text Add to dashboard Cite

Microbial community profiles have been associated with a variety of traits, including methane emissions in livestock. These profiles can be difficult and expensive to obtain for thousands of samples (e.g. for accurate association of microbial profiles with traits), therefore the objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (referencebased and reference-free) were compared to bacterial 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n = 236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to RE-RRS. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of laboratory and analytical method, and the relationship with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach, which accounted for 53.3±5.9% of reads, and had repeatabilities of 0.49±0.07 and 0.50±0.07 for the first two principal components (PC1 and PC2), phenotypic correlations with methane yield of 0.43±0.06 and 0.46±0.06 for PC1 and PC2, and explained 41±8% of the variation in methane yield. These results were significantly better than for bacterial 16S rRNA gene sequencing of the same samples (p<0.05) except for the correlation between PC2 and methane yield. A Sensitivity study suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, meaning the current work using 118 samples/lane and future proposed 384 samples/lane are well within that threshold. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.

show abstract

“…For habitats that have yet to be deeply interrogated, the access to this breadth outweighs the risk of misclassification due to annotation error. However, once a habitat is sufficiently explored, a habitat-specific database enables accurate fine-level phylogenetic resolution for taxonomic assignment to ASVs [20-30]. Existing habitat-specific databases are constructed with different methods and can be used to assign taxonomy via different approaches.…”

Section: Introductionmentioning

confidence: 99%

“…Existing habitat-specific databases are constructed with different methods and can be used to assign taxonomy via different approaches. Examples of this include the following: 1) stand-alone habitat-specific databases consisting of curated collections of close-to-full-length 16S rRNA gene sequences compiled both from other repositories and by generating new sequences from the habitat of interest, e.g., eHOMD for the human aerodigestive tract [20, 29], HITdb for the human gut [23] and RIM-DP for rumen [22]; 2) custom addition of compiled sequences from a specific habitat of interest to augment a broad general database, e.g., HBDB for honey bee [21], DictDB for termite and cockroach gut [27], SILVA19Rum for rumen [30] and MiDAS for activated sludge [24, 26]; 3) both a general and a habitat-specific database combined in the same pipeline, e.g., a general database followed by a most common ancestors approach with a custom species-level phylogeny of selected human-associated genera with pathogenic members [31] and FreshTrain with the TaxAss workflow for freshwater [28]. Many of these databases are used to train classifiers for taxonomy assignment.…”

Section: Introductionmentioning

confidence: 99%

Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets

Escapa

Huang

Chen

et al. 2019

Preprint

View full text Add to dashboard Cite

18 Background: The low cost of 16S rRNA gene sequencing facilitates population-scale 19 molecular epidemiological studies. Existing computational algorithms can parse 16S 20rRNA gene sequences to high-resolution Amplicon Sequence Variants (ASVs), which 21 represent ecologically coherent entities. Assigning species-level taxonomy to these ASVs 22is the critical remaining barrier to drawing ecologically/clinically relevant inferences from 23 and comparing data across 16S rRNA gene-based microbiota studies. 24Results: To overcome this barrier, we developed a broadly applicable method for 25 constructing a phylogeny-based, high-resolution, habitat-specific training set. When used 26 with the naïve Bayesian Ribosomal Database Project (RDP) Classifier, this training set 27 achieved species/supraspecies-level taxonomic assignment to 16S rRNA gene-derived 28ASVs. The key steps for generating such a training set are 1) constructing an accurate 29 and comprehensive phylogenetic-based, habitat-specific database; 2) compiling multiple 30 16S rRNA gene sequences to represent the natural sequence variability of each taxon in 31 the database; 3) trimming the training set to match the sequenced regions if necessary; 32 and 4) placing species sharing closely related sequences into a supraspecies taxonomic 33 level to maintain subgenus resolution. As proof of principle, we developed a V1-V3 region 34 training set for the bacterial microbiota of the human aerodigestive tract using our 35 expanded Human Oral Microbiome Database (eHOMD). In addition, we overcame 36 technical limitations to successfully use Illumina sequences for the 16S rRNA gene V1-37 V3 region, the most informative segment for classifying bacteria native to the human 38 aerodigestive tract. We also generated a full-length eHOMD 16S rRNA gene training set, 39 which we used in conjunction with an independent PacBio Single Molecule, Real-Time 40 (SMRT)-sequenced sinonasal dataset to validate the representation of species in our 41 training set. The latter also established the effectiveness of a full-length training set for 42 assigning taxonomy of long-read 16S rRNA gene datasets. 43Conclusion: Here, we present a systematic approach for constructing a phylogeny-44 based, high-resolution, habitat-specific training set that permits species/supraspecies-45 level taxonomic assignment to short-and long-read 16S rRNA gene-derived ASVs. This 46 advancement enhances the ecological and/or clinical relevance of 16S rRNA gene-based 47 microbiota studies. 48In microbiota studies of most ecosystems and/or habitats, achieving ecologically and/or 53 clinically relevant results requires species-level identification of constituents. For 54 example, species-level identification is often critically important for host-associated 55 microbial communities because these often include commensal and pathogenic species 56 of the same genus, e.g., [1, 2]. Also, some microbial genera include species that are site 57 specialists and inhabit distinct niches of a given environment [3]. Although hig...

show abstract

Improved taxonomic assignment of rumen bacterial 16S rRNA sequences using a revised SILVA taxonomic framework

Cited by 81 publications

References 47 publications

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets

Contact Info

Product

Resources

About