Single-cell RNA-sequencing (scRNA-seq) is a powerful tool to quantify transcriptional states in thousands to millions of cells. It is increasingly common for scRNA-seq data to be collected in multiple conditions to measure the effect of an experimental perturbation. However, quantifying differences between scRNA-seq datasets remains an analytical challenge. Previous efforts at quantifying such differences focus on discrete regions of the transcriptional state space such as clusters of cells. Here, we describe a continuous measure of the effect of an experiment across the transcriptomic space with single cell resolution. First, we use the manifold assumption to model the cellular state space as a graph with cells as nodes and edges connecting cells with similar transcriptomic profiles. Next, we calculate an Enhanced Experimental Signal (EES) that estimates the likelihood of observing cells from each condition at every point in the manifold. We show that the EES has useful properties for analysis of single cell perturbation studies. We show that we can use the magnitude and frequency of the EES, using an algorithm we call vertex frequency clustering, to identify specific populations of cells that are or are not affected by an experimental treatment at the appropriate level of granularity. Using these selected populations we can derive gene signatures of affected populations of cells. We demonstrate both algorithms using a combination of biological and synthetic datasets. Implementations are provided in the MELD Python package, which is available at https://github.com/KrishnaswamyLab/MELD. IntroductionAs single-cell RNA-sequencing (scRNA-seq) has become more accessible, the design of single-cell experiments has become increasingly complex. Researchers regularly use scRNA-seq to quantify the effect of a drug, gene knockout, or other experimental perturbation on a biological system. However, quantifying the 1 .
4Single-cell RNA-sequencing (scRNA-seq) is a powerful tool to quantify transcriptional states in 5 thousands to millions of cells. It is increasingly common for scRNA-seq data to be collected in 6 multiple experimental conditions, yet quantifying differences between scRNA-seq datasets re-7 mains an analytical challenge. Previous efforts at quantifying such differences focus on discrete 8 regions of the transcriptional state space such as clusters of cells. Here, we describe a contin-9 uous measure of the effect of an experiment across the transcriptomic space. First, we use the 10 manifold assumption to model the cellular state space as a graph (or network) with cells as nodes 11 and edges connecting cells with similar transcriptomic profiles. Next, we create an Enhanced 12 Experimental Signal (EES) that estimates the likelihood of observing cells from each condition 13 at every point in the manifold. We show that the EES has useful properties and information that 14 can be extracted. The EES can be used to identify how gene expression is affected by a given 15 perturbation, including identifying non-monotonic changes from only two conditions. We also 16 show that we can use both the magnitude and frequency of the EES, using an algorithm we 17 call vertex frequency clustering, to derive subsets of cells at appropriate levels of granularity 18 (tailored to areas that change) that are enriched in the experimental or control conditions or that 19 are unaffected between conditions. We demonstrate both algorithms using a combination of 20 biological and synthetic datasets. Implementations are provided in the MELD Python package, 21 which is available at https://github.com/KrishnaswamyLab/MELD. 22As single-cell RNA-sequencing (scRNA-seq) has become more accessible, the design of single-cell exper-24 iments has become increasingly complex. Researchers regularly use scRNA-seq to quantify the effect of 25 a drug, gene knockout, or other experimental perturbation on a biological system. However, quantifying 26 the compositional differences between single-cell datasets collected from multiple experimental conditions 27 1 remains an analytical challenge [1] because of the heterogeneity and noise in both the data and the effects 28 of a given perturbation. 29 Previous work has shown the utility of modelling the transcriptomic state space as a continuous low-30 dimensional manifold, or set of manifolds, to characterize cellular heterogeneity and dynamic biological 31 processes [2][3][4][5][6][7][8]. In the manifold model, the biologically valid combinations of gene expression are rep-32 resented as a smooth, low-dimensional surface in a high dimensional space, such as a two-dimensional 33 sheet embedded in three dimensions. The main challenge in developing tools to quantify compositional 34 differences between single-cell datasets is that each dataset comprises several intrinsic structures of hetero-35 geneous cells, and the effect of the experimental condition could be diffuse or isolated to particular areas 36 of...
No abstract
Diffusion maps are a commonly used kernel-based method for manifold learning, which can reveal intrinsic structures in data and embed them in low dimensions. However, as with most kernel methods, its implementation requires a heavy computational load, reaching up to cubic complexity in the number of data points. This limits its usability in modern data analysis. Here, we present a new approach to computing the diffusion geometry, and related embeddings, from a compressed diffusion process between data regions rather than data points. Our construction is based on an adaptation of the previously proposed measure-based Gaussian correlation (MGC) kernel that robustly captures the local geometry around data points. We use this MGC kernel to efficiently compress diffusion relations from pointwise to data region resolution. Finally, a spectral embedding of the data regions provides coordinates that are used to interpolate and approximate the pointwise diffusion map embedding of data. We analyze theoretical connections between our construction and the original diffusion geometry of diffusion maps, and demonstrate the utility of our method in analyzing big datasets, where it outperforms competing approaches. † These authors contributed equally. † † These authors contributed equally. § Corresponding
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.