Background Estimating and accounting for hidden variables is widely practiced as an important step in molecular quantitative trait locus (molecular QTL, henceforth “QTL”) analysis for improving the power of QTL identification. However, few benchmark studies have been performed to evaluate the efficacy of the various methods developed for this purpose. Results Here we benchmark popular hidden variable inference methods including surrogate variable analysis (SVA), probabilistic estimation of expression residuals (PEER), and hidden covariates with prior (HCP) against principal component analysis (PCA)—a well-established dimension reduction and factor discovery method—via 362 synthetic and 110 real data sets. We show that PCA not only underlies the statistical methodology behind the popular methods but is also orders of magnitude faster, better-performing, and much easier to interpret and use. Conclusions To help researchers use PCA in their QTL analysis, we provide an R package along with a detailed guide, both of which are freely available at https://github.com/heatherjzhou/PCAForQTL. We believe that using PCA rather than SVA, PEER, or HCP will substantially improve and simplify hidden variable inference in QTL mapping as well as increase the transparency and reproducibility of QTL research.
Estimating and accounting for hidden variables is widely practiced as an important step in quantitative trait locus (QTL) analysis for improving the power of QTL identification. Here we benchmark popular hidden variable inference methods including surrogate variable analysis (SVA), probabilistic estimation of expression residuals (PEER), and hidden covariates with prior (HCP) against principal component analysis (PCA; a well-established dimension reduction and factor discovery method) through comprehensive simulation studies and show that PCA is faster and better-performing. In particular, we show that under realistic simulation scenarios, PEER---the most popular hidden variable inference method for QTL mapping to date---does not have the alleged advantage that its performance does not deteriorate as the number of inferred covariates increases. Additionally, our real data analysis shows that PEER produces identical results as PCA at best, is at least three orders of magnitude slower, and can fail to capture important variance components of the molecular phenotype data in certain cases. Based on these results and the fact that PCA is much easier to interpret and use, we contend that PCA should be the preferred hidden variable inference method for QTL mapping until evidence suggests otherwise. To help researchers use PCA in their QTL analysis, we provide an R package PCAForQTL along with a detailed tutorial.
Skin epidermis constitutes the outer permeability barrier that protects the body from dehydration, heat loss, and myriad external assaults. Mechanisms that maintain barrier integrity in constantly challenged adult skin and how epidermal dysregulation shapes the local immune microenvironment and whole‐body metabolism remain poorly understood. Here, we demonstrate that inducible and simultaneous ablation of transcription factor‐encoding Ovol1 and Ovol2 in adult epidermis results in barrier dysregulation through impacting epithelial‐mesenchymal plasticity and inflammatory gene expression. We find that aberrant skin immune activation then ensues, featuring Langerhans cell mobilization and T cell responses, and leading to elevated levels of secreted inflammatory factors in circulation. Finally, we identify failure to gain body weight and accumulate body fat as long‐term consequences of epidermal‐specific Ovol1/2 loss and show that these global metabolic changes along with the skin barrier/immune defects are partially rescued by immunosuppressant dexamethasone. Collectively, our study reveals key regulators of adult barrier maintenance and suggests a causal connection between epidermal dysregulation and whole‐body metabolism that is in part mediated through aberrant immune activation.
In this response to the correspondence by Hejblum et al. [1], we clarify the reasons why we ran the Wilcoxon rank-sum test on the semi-synthetic RNA-seq samples without normalization, and why we could only run dearseq with its built-in normalization, in our published study [2]. We also argue that no normalization should be performed on the semi-synthetic samples. Hence, for fairer method comparison and using the updated dearseq package by Hejblum et al., we re-run the six differential expression methods (DESeq2, edgeR, limma-voom, dearseq, NOISeq, and the Wilcoxon rank-sum test) without normalizing the semi-synthetic samples, i.e., under the "No normalization" scheme in [1]. Our updated results show that the Wilcoxon rank-sum test is still the best method in terms of false discovery rate (FDR) control and power performance under all settings investigated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.