scRNA-seq profiles each represent a highly partial sample of mRNA molecules from a unique cell that can never be resampled, and robust analysis must separate the sampling effect from biological variance. We describe a methodology for partitioning scRNA-seq datasets into metacells: disjoint and homogenous groups of profiles that could have been resampled from the same cell. Unlike clustering analysis, our algorithm specializes at obtaining granular as opposed to maximal groups. We show how to use metacells as building blocks for complex quantitative transcriptional maps while avoiding data smoothing. Our algorithms are implemented in the MetaCell R/C++ software package.
Single cell RNA-seq (scRNA-seq) has become the method of choice for analyzing mRNA distributions in heterogeneous cell populations. scRNA-seq only partially samples the cells in a tissue and the RNA in each cell, resulting in sparse data that challenge analysis. We develop a methodology that addresses scRNA-seq's sparsity through partitioning the data into metacells: disjoint, homogenous and highly compact groups of cells, each exhibiting only sampling variance. Metacells constitute local building blocks for clustering and quantitative analysis of gene expression, while not enforcing any global structure on the data, thereby maintaining statistical control and minimizing biases. We illustrate the MetaCell framework by re-analyzing cell type and transcriptional gradients in peripheral blood and whole organism scRNA-seq maps. Our algorithms are implemented in the new MetaCell R/C++ software package. BACKGROUNDSingle cell RNA-seq (scRNA-seq) is used extensively for discovery and identification of cell types, for characterizing transcriptional states within them, and for inference of continuous gene expression gradients linking these states. These phenomenological observations are used for creating cell type atlases, and as a starting point for analysis of different cellular processes, including differentiation, cell cycle and response to stimuli 1-9 (reviewed in 10 ). Key challenges in the analysis of scRNA-seq data are the discrete and variable nature of the cellular mRNA molecule census, and the sparsity of the scRNA-seq molecule count matrices aiming at its characterization. In mammals, only 10 5 -10 6 copies of mRNA are present within each cell, representing regulated and stochastic transcriptional activities of over 20,000 genes 11 . At the same time, accurate quantitative models for the manner in which changes in the RNA abundance of specific genes distinguish cell types and cellular programs are lacking. In many cases RNA abundances can vary significantly due to biological noise, without a major effect on the cell's current and future function. In other cases, genes with very low RNA molecule counts can have critical impact on the regulation of key programs. Finally, scRNA-seq technologies sample these stochastic mRNA pools sparsely, typically deriving between 10,000 unique molecule identifiers (UMI) from larger mammalian cells to less than 1000 UMIs for many important populations of smaller or
New antimycotic drugs are challenging to find, as potential target proteins may have close human orthologs. We here focus on identifying metabolic targets that are critical for fungal growth and have minimal similarity to targets among human proteins. We compare and combine here: (I) direct metabolic network modeling using elementary mode analysis and flux estimates approximations using expression data, (II) targeting metabolic genes by transcriptome analysis of condition-specific highly expressed enzymes, and (III) analysis of enzyme structure, enzyme interconnectedness (“hubs”), and identification of pathogen-specific enzymes using orthology relations. We have identified 64 targets including metabolic enzymes involved in vitamin synthesis, lipid, and amino acid biosynthesis including 18 targets validated from the literature, two validated and five currently examined in own genetic experiments, and 38 further promising novel target proteins which are non-orthologous to human proteins, involved in metabolism and are highly ranked drug targets from these pipelines.
BackgroundOnly a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs.ResultsWe systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality.ConclusionsWe estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1293-0) contains supplementary material, which is available to authorized users.
Propagation of clonal regulatory programs contributes to cancer development. It is poorly understood how epigenetic mechanisms interact with genetic drivers to shape this process. Here we combine single-cell analysis of transcription and DNA methylation with a Luria-Delbrück experimental design to demonstrate the existence of clonally stable epigenetic memory in multiple types of cancer cells. Longitudinal transcriptional and genetic analysis of clonal colon cancer cell populations reveals a slowly drifting spectrum of epithelial-to-mesenchymal transcriptional identities that is seemingly independent of genetic variation. DNA methylation landscapes correlate with these identities but also reflect an independent clock-like methylation loss process. Methylation variation can be explained as an effect of global trans -acting factors in most cases. However, for a specific class of promoters, in particular cancer testis antigens (CTA), de-repression is correlated with and likely driven by loss of methylation in cis . This study indicates how genetic sub-clonal structure in cancer cells can be diversified by epigenetic memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.