4Single-cell RNA-sequencing (scRNA-seq) is a powerful tool to quantify transcriptional states in 5 thousands to millions of cells. It is increasingly common for scRNA-seq data to be collected in 6 multiple experimental conditions, yet quantifying differences between scRNA-seq datasets re-7 mains an analytical challenge. Previous efforts at quantifying such differences focus on discrete 8 regions of the transcriptional state space such as clusters of cells. Here, we describe a contin-9 uous measure of the effect of an experiment across the transcriptomic space. First, we use the 10 manifold assumption to model the cellular state space as a graph (or network) with cells as nodes 11 and edges connecting cells with similar transcriptomic profiles. Next, we create an Enhanced 12 Experimental Signal (EES) that estimates the likelihood of observing cells from each condition 13 at every point in the manifold. We show that the EES has useful properties and information that 14 can be extracted. The EES can be used to identify how gene expression is affected by a given 15 perturbation, including identifying non-monotonic changes from only two conditions. We also 16 show that we can use both the magnitude and frequency of the EES, using an algorithm we 17 call vertex frequency clustering, to derive subsets of cells at appropriate levels of granularity 18 (tailored to areas that change) that are enriched in the experimental or control conditions or that 19 are unaffected between conditions. We demonstrate both algorithms using a combination of 20 biological and synthetic datasets. Implementations are provided in the MELD Python package, 21 which is available at https://github.com/KrishnaswamyLab/MELD.
22As single-cell RNA-sequencing (scRNA-seq) has become more accessible, the design of single-cell exper-24 iments has become increasingly complex. Researchers regularly use scRNA-seq to quantify the effect of 25 a drug, gene knockout, or other experimental perturbation on a biological system. However, quantifying 26 the compositional differences between single-cell datasets collected from multiple experimental conditions 27 1 remains an analytical challenge [1] because of the heterogeneity and noise in both the data and the effects 28 of a given perturbation. 29 Previous work has shown the utility of modelling the transcriptomic state space as a continuous low-30 dimensional manifold, or set of manifolds, to characterize cellular heterogeneity and dynamic biological 31 processes [2][3][4][5][6][7][8]. In the manifold model, the biologically valid combinations of gene expression are rep-32 resented as a smooth, low-dimensional surface in a high dimensional space, such as a two-dimensional 33 sheet embedded in three dimensions. The main challenge in developing tools to quantify compositional 34 differences between single-cell datasets is that each dataset comprises several intrinsic structures of hetero-35 geneous cells, and the effect of the experimental condition could be diffuse or isolated to particular areas 36 of...