Identifying differentially abundant microbes is a common goal of microbiome studies. Multiple methods are used interchangeably for this purpose in the literature. Yet, there are few large-scale studies systematically exploring the appropriateness of using these tools interchangeably, and the scale and significance of the differences between them. Here, we compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups. We test for differences in amplicon sequence variants and operational taxonomic units (ASVs) between these groups. Our findings confirm that these tools identified drastically different numbers and sets of significant ASVs, and that results depend on data pre-processing. For many tools the number of features identified correlate with aspects of the data, such as sample size, sequencing depth, and effect size of community differences. ALDEx2 and ANCOM-II produce the most consistent results across studies and agree best with the intersect of results from different approaches. Nevertheless, we recommend that researchers should use a consensus approach based on multiple differential abundance methods to help ensure robust biological interpretations.
BackgroundCrohn’s disease (CD) has an unclear etiology, but there is growing evidence of a direct link with a dysbiotic microbiome. Many gut microbes have previously been associated with CD, but these have mainly been confounded with patients’ ongoing treatments. Additionally, most analyses of CD patients’ microbiomes have focused on microbes in stool samples, which yield different insights than profiling biopsy samples.ResultsWe sequenced the 16S rRNA gene (16S) and carried out shotgun metagenomics (MGS) from the intestinal biopsies of 20 treatment-naïve CD and 20 control pediatric patients. We identified the abundances of microbial taxa and inferred functional categories within each dataset. We also identified known human genetic variants from the MGS data. We then used a machine learning approach to determine the classification accuracy when these datasets, collapsed to different hierarchical groupings, were used independently to classify patients by disease state and by CD patients’ response to treatment. We found that 16S-identified microbes could classify patients with higher accuracy in both cases. Based on follow-ups with these patients, we identified which microbes and functions were best for predicting disease state and response to treatment, including several previously identified markers. By combining the top features from all significant models into a single model, we could compare the relative importance of these predictive features. We found that 16S-identified microbes are the best predictors of CD state whereas MGS-identified markers perform best for classifying treatment response.ConclusionsWe demonstrate for the first time that useful predictors of CD treatment response can be produced from shotgun MGS sequencing of biopsy samples despite the complications related to large proportions of host DNA. The top predictive features that we identified in this study could be useful for building an improved classifier for CD and treatment response based on sufferers’ microbiome in the future.The BISCUIT project is funded by a Clinical Academic Fellowship from the Chief Scientist Office (Scotland)—CAF/08/01.Electronic supplementary materialThe online version of this article (10.1186/s40168-018-0398-3) contains supplementary material, which is available to authorized users.
Background The gut microbiome is extensively involved in induction of remission in pediatric Crohn’s disease (CD) patients by exclusive enteral nutrition (EEN). In this follow-up study of pediatric CD patients undergoing treatment with EEN, we employ machine learning models trained on baseline gut microbiome data to distinguish patients who achieved and sustained remission (SR) from those who did not achieve remission nor relapse (non-SR) by 24 weeks. Methods A total of 139 fecal samples were obtained from 22 patients (8–15 years of age) for up to 96 weeks. Gut microbiome taxonomy was assessed by 16S rRNA gene sequencing, and functional capacity was assessed by metagenomic sequencing. We used standard metrics of diversity and taxonomy to quantify differences between SR and non-SR patients and to associate gut microbial shifts with fecal calprotectin (FCP), and disease severity as defined by weighted Pediatric Crohn’s Disease Activity Index. We used microbial data sets in addition to clinical metadata in random forests (RFs) models to classify treatment response and predict FCP levels. Results Microbial diversity did not change after EEN, but species richness was lower in low-FCP samples (<250 µg/g). An RF model using microbial abundances, species richness, and Paris disease classification was the best at classifying treatment response (area under the curve [AUC] = 0.9). KEGG Pathways also significantly classified treatment response with the addition of the same clinical data (AUC = 0.8). Top features of the RF model are consistent with previously identified IBD taxa, such as Ruminococcaceae and Ruminococcus gnavus. Conclusions Our machine learning approach is able to distinguish SR and non-SR samples using baseline microbiome and clinical data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.