Scanpy for analysis of large-scale single-cell gene expression data

Wolf, F. Alexander; Angerer, Philipp; Fj, Theis

doi:10.1101/174029

Cited by 996 publications

(1,432 citation statements)

References 43 publications

Supporting

Mentioning

1,427

Contrasting

Order By: Relevance

“…Although cross‐environment support is growing (preprint: Scholz et al , ), the choice of programming language is often also a choice between analysis tools. Popular platforms such as Seurat (Butler et al , ), Scater (McCarthy et al , ), or Scanpy (Wolf et al , ) provide integrated environments to develop pipelines and contain large analysis toolboxes. However, out of necessity these platforms limit themselves to tools developed in their respective programming languages.…”

Section: Introductionmentioning

confidence: 99%

“…The most common biological data correction is to remove the effects of the cell cycle on the transcriptome. This data correction can be performed by a simple linear regression against a cell cycle score as implemented in the Scanpy and Seurat platforms (Butler et al , ; Wolf et al , ) or in specialized packages with more complex mixture models such as scLVM (Buettner et al , ) or f‐scLVM (Buettner et al , ). Lists of marker genes to compute cell cycle scores are obtained from the literature (Macosko et al , ).…”

Section: Introductionmentioning

confidence: 99%

“…Scater has a particular strength in QC and pre‐processing, while Seurat is arguably the most popular and comprehensive platform, which includes a large array of tools and tutorials. A recent addition to this group is scanpy (Wolf et al , ), a growing Python‐based platform, which exhibits improved scaling to larger numbers of cells. It leverages the increasing number of tools written in Python, which is particularly popular for machine learning applications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Current best practices in single‐cell RNA‐seq analysis: a tutorial

Luecken

Theis

2019

Molecular Systems Biology

1,597

1,386

View full text Add to dashboard Cite

Single‐cell RNA ‐seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single‐cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up‐to‐date workflow to analyse one's data. Here, we detail the steps of a typical single‐cell RNA ‐seq analysis, including pre‐processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell‐ and gene‐level downstream analysis. We formulate current best‐practice recommendations for these steps based on independent comparison studies. We have integrated these best‐practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial . This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Current best practices in single‐cell RNA‐seq analysis: a tutorial

Luecken

Theis

2019

Molecular Systems Biology

1,597

1,386

View full text Add to dashboard Cite

show abstract

“…Scores for gene signatures were calculated using Scanpy's score_genes method (Wolf et al , ), and code is available on our GitHub (https://github.com/MSingerLab/COMETSC).…”

Section: Methodsmentioning

confidence: 99%

Combinatorial prediction of marker panels from single‐cell transcriptomic data

Delaney

Schnell

Cammarata

et al. 2019

Molecular Systems Biology

View full text Add to dashboard Cite

Single‐cell transcriptomic studies are identifying novel cell populations with exciting functional roles in various in vivo contexts, but identification of succinct gene marker panels for such populations remains a challenge. In this work, we introduce COMET, a computational framework for the identification of candidate marker panels consisting of one or more genes for cell populations of interest identified with single‐cell RNA‐seq data. We show that COMET outperforms other methods for the identification of single‐gene panels and enables, for the first time, prediction of multi‐gene marker panels ranked by relevance. Staining by flow cytometry assay confirmed the accuracy of COMET's predictions in identifying marker panels for cellular subtypes, at both the single‐ and multi‐gene levels, validating COMET's applicability and accuracy in predicting favorable marker panels from transcriptomic input. COMET is a general non‐parametric statistical framework and can be used as‐is on various high‐throughput datasets in addition to single‐cell RNA‐sequencing data. COMET is available for use via a web interface (http://www.cometsc.com/) or a stand‐alone software package (https://github.com/MSingerLab/COMETSC).

show abstract

“…In order to achieve good performance, however, the datasets often need to be carefully preprocessed, and the algorithms require non-intuitive hyperparameter tuning. To address specific computational challenges of single-cell RNA-Seq datasets, researchers have developed a wide array of application-specific clustering algorithms [28][29][30][31][32][33][34] and packages for end-to-end analysis 21,[35][36][37][38][39] . Regardless of which set of these tools one uses, finding the right approach for clustering a specific dataset requires careful design of the computational workflow, but often finding a good combination of clustering algorithm and hyperparameters is time-consuming and difficult.…”

Section: Introductionmentioning

confidence: 99%

An Interpretable Framework for Clustering Single-Cell RNA-Seq Datasets

Zhang

Fan

et al. 2017

Preprint

View full text Add to dashboard Cite

With the recent proliferation of single-cell RNA-Seq experiments, several methods have been developed for unsupervised analysis of the resulting datasets. These methods often rely on unintuitive hyperparameters and do not explicitly address the subjectivity associated with clustering. In this work, we present DendroSplit, an interpretable framework for analyzing single-cell RNA-Seq datasets that addresses both these issues. Under this framework, we cluster using feature selection to uncover multiple levels of biologically meaningful populations in the data. We analyze several landmark single-cell datasets, demonstrating both the method's efficacy and computational efficiency. We provide the full DendroSplit software package at https://github.com/jessemzhang/dendrosplit.

show abstract

Scanpy for analysis of large-scale single-cell gene expression data

Cited by 996 publications

References 43 publications

Current best practices in single‐cell RNA‐seq analysis: a tutorial

Current best practices in single‐cell RNA‐seq analysis: a tutorial

Combinatorial prediction of marker panels from single‐cell transcriptomic data

An Interpretable Framework for Clustering Single-Cell RNA-Seq Datasets

Contact Info

Product

Resources

About