2017
DOI: 10.1101/174029
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scanpy for analysis of large-scale single-cell gene expression data

Abstract: We present Scanpy, a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The Python-based implementation efficiently deals with datasets of more than one million cells and enables easy interfacing of advanced machine learning packages. Code is available from https://github.com/theislab/scanpy.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
1,427
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 996 publications
(1,432 citation statements)
references
References 43 publications
5
1,427
0
Order By: Relevance
“…Although cross‐environment support is growing (preprint: Scholz et al , ), the choice of programming language is often also a choice between analysis tools. Popular platforms such as Seurat (Butler et al , ), Scater (McCarthy et al , ), or Scanpy (Wolf et al , ) provide integrated environments to develop pipelines and contain large analysis toolboxes. However, out of necessity these platforms limit themselves to tools developed in their respective programming languages.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Although cross‐environment support is growing (preprint: Scholz et al , ), the choice of programming language is often also a choice between analysis tools. Popular platforms such as Seurat (Butler et al , ), Scater (McCarthy et al , ), or Scanpy (Wolf et al , ) provide integrated environments to develop pipelines and contain large analysis toolboxes. However, out of necessity these platforms limit themselves to tools developed in their respective programming languages.…”
Section: Introductionmentioning
confidence: 99%
“…The most common biological data correction is to remove the effects of the cell cycle on the transcriptome. This data correction can be performed by a simple linear regression against a cell cycle score as implemented in the Scanpy and Seurat platforms (Butler et al , ; Wolf et al , ) or in specialized packages with more complex mixture models such as scLVM (Buettner et al , ) or f‐scLVM (Buettner et al , ). Lists of marker genes to compute cell cycle scores are obtained from the literature (Macosko et al , ).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Scores for gene signatures were calculated using Scanpy's score_genes method (Wolf et al , ), and code is available on our GitHub (https://github.com/MSingerLab/COMETSC).…”
Section: Methodsmentioning
confidence: 99%
“…In order to achieve good performance, however, the datasets often need to be carefully preprocessed, and the algorithms require non-intuitive hyperparameter tuning. To address specific computational challenges of single-cell RNA-Seq datasets, researchers have developed a wide array of application-specific clustering algorithms [28][29][30][31][32][33][34] and packages for end-to-end analysis 21,[35][36][37][38][39] . Regardless of which set of these tools one uses, finding the right approach for clustering a specific dataset requires careful design of the computational workflow, but often finding a good combination of clustering algorithm and hyperparameters is time-consuming and difficult.…”
Section: Introductionmentioning
confidence: 99%