Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.
Although gut microbiome dysbiosis has been associated with inflammatory bowel disease (IBD), the relationship between the oral microbiota and IBD remains poorly understood. This study aimed to identify unique microbiome patterns in saliva from IBD patients and explore potential oral microbial markers for differentiating Crohn’s disease (CD) and ulcerative colitis (UC). A prospective cohort study recruited IBD patients (UC: n = 175, CD: n = 127) and healthy controls (HC: n = 100) to analyze their oral microbiota using 16S rRNA gene sequencing. Machine learning models (sparse partial least squares discriminant analysis (sPLS-DA)) were trained with the sequencing data to classify CD and UC. Taxonomic classification resulted in 4041 phylotypes using Kraken2 and the SILVA reference database. After quality filtering, 398 samples (UC: n = 175, CD: n = 124, HC: n = 99) and 2711 phylotypes were included. Alpha diversity analysis revealed significantly reduced richness in the microbiome of IBD patients compared to healthy controls. The sPLS-DA model achieved high accuracy (mean accuracy: 0.908, and AUC: 0.966) in distinguishing IBD vs. HC, as well as good accuracy (0.846) and AUC (0.923) in differentiating CD vs. UC. These findings highlight distinct oral microbiome patterns in IBD and provide insights into potential diagnostic markers.
Background Although the gut microbiome dysbiosis have independently been shown to be associated with inflammatory bowel disease (IBD), less is known about the relationship between oral microbiota and IBD. This study aimed to elucidate unique microbiome patterns in saliva from patients with IBD and investigate potential oral microbial markers for differentiating Crohn’s disease (CD) and ulcerative colitis (UC). Methods A multicenter, prospective cohort study recruited patients with IBD (UC, n=175, CD, n=127) and unrelated healthy controls (HC, n=100) to examine microbiota within the oral microenvironments. We used 16S rRNA gene sequencing data as features in training machine learning models (sPLS-DA, Sparse Partial Least-Squares Discriminant Analysis) to classify CD and UC. Results The V3-V4 amplicon reads of the saliva 16S rRNA sequencing data were taxonomically classified to a total of 2839 taxa (2270 genera) using Kraken2 based on Silva 138.1 reference. The sequences that were not classifiable down to family level were removed, and the samples having sequence depth less than 30000 were also removed, resulting in 2616 taxa for 390 samples (UC, n=168, CD, n=124, HC, n=98). The alpha diversity analysis revealed that the microbiome in IBD patients were significantly less rich than the healthy controls, while CD samples were slightly richer then UC samples (Figure 1, Observed, P = 0.01, Shannon index, p=0.02, Chao index, P=0.0001). An sPLS-DA model with 470 taxa as features was able to distinguish IBD vs control with high performance (AUROC=0.9774), while a separate sPLS-DA model with 130 features classified CD vs UC with an AUROC of 0.8755 (Figure 2,3). Conclusion Collectively, oral microbial profiles can serve as a diagnostic marker to discriminate patients with IBD from HC, and patients with CD from UC. As obtaining oral samples is relatively easier than obtaining stool or intestinal biopsies, an opportunity exists to perform oral microbiome-based studies in larger cohort sizes, preferentially in a longitudinal fashion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.