Molecular mechanisms driving disease course and response to therapy in ulcerative colitis (UC) are not well understood. Here, we use RNAseq to define pre-treatment rectal gene expression, and fecal microbiota profiles, in 206 pediatric UC patients receiving standardised therapy. We validate our key findings in adult and paediatric UC cohorts of 408 participants. We observe a marked suppression of mitochondrial genes and function across cohorts in active UC, and that increasing disease severity is notable for enrichment of adenoma/adenocarcinoma and innate immune genes. A subset of severity genes improves prediction of corticosteroid-induced remission in the discovery cohort; this gene signature is also associated with response to anti-TNFα and anti-α4β7 integrin in adults. The severity and therapeutic response gene signatures were in turn associated with shifts in microbes previously implicated in mucosal homeostasis. Our data provide insights into UC pathogenesis, and may prioritise future therapies for nonresponders to current approaches.
Background Correcting a heterogeneous dataset that presents artefacts from several confounders is often an essential bioinformatics task. Attempting to remove these batch effects will result in some biologically meaningful signals being lost. Thus, a central challenge is assessing if the removal of unwanted technical variation harms the biological signal that is of interest to the researcher. Results We describe a novel framework, B-CeF, to evaluate the effectiveness of batch correction methods and their tendency toward over or under correction. The approach is based on comparing co-expression of adjusted gene-gene pairs to a-priori knowledge of highly confident gene-gene associations based on thousands of unrelated experiments derived from an external reference. Our framework includes three steps: (1) data adjustment with the desired methods (2) calculating gene-gene co-expression measurements for adjusted datasets (3) evaluating the performance of the co-expression measurements against a gold standard. Using the framework, we evaluated five batch correction methods applied to RNA-seq data of six representative tissue datasets derived from the GTEx project. Conclusions Our framework enables the evaluation of batch correction methods to better preserve the original biological signal. We show that using a multiple linear regression model to correct for known confounders outperforms factor analysis-based methods that estimate hidden confounders. The code is publicly available as an R package. Electronic supplementary material The online version of this article (10.1186/s12859-019-2855-9) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.