Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) approach to remove microbiome batch effects using a two-part quantile regression model. ConQuR is a comprehensive method that accommodates the complex distributions of microbial read counts by non-parametric modeling, and it generates batchremoved zero-inflated read counts that can be used in and benefit usual subsequent analyses. We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest.Advances in 16S rRNA 1 and full metagenome 2 sequencing technologies have enabled large-scale human microbiome profiling studies involving hundreds to thousands of individuals. The large sample sizes of these studies and the rich availability of metadata promise a comprehensive understanding of the role of microorganisms in health and disease. These studies have already revealed associations between bacterial taxa and both diseases and exposures, such as obesity 3 , type 2 diabetes 4 , bacterial vaginosis 5 , antibiotics 6 , and environmental pollutants 7 . However, although large-scale studies facilitate more robust and powerful analyses, they are often subject to serious batch effects-systematic variation in the data originating from differential handling and processing of specimens 8 . Many large studies include samples collected across times or locations and processed in different runs. In a more extreme situation, several studies may be pooled together for integrative analysis, with inter-study heterogeneity introducing even more severe variation. These batch effects pose serious challenges to analysis and can lead to excessive false positive discoveries, obscure true associations between microbes and clinical variables, and hinder prediction modeling and biomarker development. Unfortunately, despite the importance of batch effects,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.