Tissue or cell subtype-specific and differentially-expressed genes (SDEGs) are defined as being differentially expressed in a particular tissue or cell subtype among multiple subtypes. Detecting SDEGs plays a critical rolse in molecularly characterizing and identifying tissue or cell subtypes, and facilitating supervised deconvolution of complex tissues. Unfortunately, classic differential analysis assumes a convenient null hypothesis and associated test statistic that is subtype-non-specific and thus, resulting in a high false positive rate and/or lower detection power with respect to particular subtypes. Here we introduce One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. To assess the statistical significance of such test, we also propose the scaled test statistic OVE-sFC together with a mixture null distribution model and a tailored permutation scheme. Validated with realistic synthetic data sets on both type 1 error and detection power, OVE-FC/sFC test applied to two benchmark gene expression data sets detects many known and de novo SDEGs. Subsequent supervised deconvolution results, obtained using the SDEGs detected by OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.
Motivation Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others—so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. Results We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods. Availability and implementation The Python COT software with a detailed user’s manual and a vignette are freely available at https://github.com/MintaYLu/COT. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Among multiple subtypes of tissue or cell, subtype-specific differentially-expressed genes (SDEGs) are defined as being most-upregulated in only one subtype but not in any other. Detecting SDEGs plays a critical role in the molecular characterization and deconvolution of multicellular complex tissues. Classic differential analysis assumes a null hypothesis whose test statistic is not subtype-specific, thus can produce a high false positive rate and/or lower detection power. Here we first introduce a One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. We then propose a scaled test statistic (OVE-sFC) for assessing the statistical significance of SDEGs that applies a mixture null distribution model and a tailored permutation test. The OVE-FC/sFC test was validated on both type 1 error rate and detection power using extensive simulation data sets generated from real gene expression profiles of purified subtype samples. The OVE-FC/sFC test was then applied to two benchmark gene expression data sets of purified subtype samples and detected many known or previously unknown SDEGs. Subsequent supervised deconvolution results on synthesized bulk expression data, obtained using the SDEGs detected from the independent purified expression data by the OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.
Motivation: Identification of biological pathways plays a central role in understanding both human health and diseases. Although much work has previously been done to explore the biological pathways by using single omics data, little effort has been reported using multiomics data integration, mainly due to methodological and technological limitations. Compared to single omics data, multi-omics data will help identifying disease specific functional pathways with both higher sensitivity and specificity, thus gaining more comprehensive insights into the molecular architecture of disease processes. Results:In this paper, we propose two computational approaches that integrate multi-omics data and identify disease-specific biological pathways with high sensitivity and specificity.Applying our methods to an experimental multi-omics data dataset on muscular dystrophy subtypes, we identified disease-specific pathways of high biological plausibility. The developed methodology will likely have a broad impact on improving the molecular characterization of many common diseases. Contact: yuewang@vt.eduSupplementary information: Supplementary information attached.
We develop an accurate and efficient method to detect marker genes among many subtypes using subtype-enriched expression profiles. We implement a Cosine based One-sample Test (COT) Python software that is easy to use and applicable to multi-omics data. We demonstrate the performance and utility of COT on gene expression and proteomics data acquired from tissue or cell subtypes. Formulated as a one-sample test with Cosine similarity test statistic in scatter space, the detected de novo marker genes will allow biologists to perform a more comprehensive and unbiased molecular characterization, deconvolution and classification of complex tissue or cell subtypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.