Transcriptome measurements of individual cells reflect unexplored biological diversity, but are also affected by technical noise and bias. This raises the need to model and account for the resulting uncertainty in any downstream analysis. Here, we introduce Single-cell Variational Inference (scVI), a scalable framework for probabilistic representation and analysis of gene expression in single cells. scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and approximate the distributions that underlie the observed expression values, while accounting for batch effects and limited sensitivity. We utilize scVI for a range of fundamental analysis tasks – including batch correction, visualization, clustering and differential expression – and demonstrate its accuracy and scalability in comparison to the state-of-the-art in each task. scVI is publicly available and can be readily used as a principled and inclusive solution for analyzing single-cell transcriptomes.
Highlights d We define two multiplet errors in single-cell RNA-seq data: ''embedded'' and ''neotypic'' d Neotypic errors can lead to misidentification of cell types or transitional states d Scrublet code identifies neotypic doublets and predicts the overall doublet rate d The algorithm is tested against several experimental methods for labeling multiplets
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.