2020
DOI: 10.12688/f1000research.15666.3
|View full text |Cite
|
Sign up to set email alerts
|

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Abstract: Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

7
142
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 112 publications
(149 citation statements)
references
References 52 publications
7
142
0
Order By: Relevance
“…GFP-labeled cells from Msi1 CreERT2 ; R26 mTmG mice were sorted 15 h after TAM induction, and subjected to scRNA-seq ( Figure S3A ; Table S1 ). Unsupervised clustering ( Duò et al, 2018 ) identified nine distinct cell clusters ( Figure 2A ). We utilized the differentially expressed gene signatures to assign putative cell type identities to these clusters ( Figures 2B – 2D , S3B , and S3C ).…”
Section: Resultsmentioning
confidence: 99%
“…GFP-labeled cells from Msi1 CreERT2 ; R26 mTmG mice were sorted 15 h after TAM induction, and subjected to scRNA-seq ( Figure S3A ; Table S1 ). Unsupervised clustering ( Duò et al, 2018 ) identified nine distinct cell clusters ( Figure 2A ). We utilized the differentially expressed gene signatures to assign putative cell type identities to these clusters ( Figures 2B – 2D , S3B , and S3C ).…”
Section: Resultsmentioning
confidence: 99%
“…BingleSeq’s scRNA-Seq pipeline includes three unsupervised clustering solutions provided by monocle, Seurat, and SC3 packages. The latter two packages are regarded as having the best overall clustering performance ( Duò, Robinson & Soneson, 2018 ; Freytag et al, 2018 ). However, similarly to packages used in the DE analysis of Bulk RNA-Seq data, there seems to be little consensus on which package provides the best-performing clustering approach.…”
Section: Discussionmentioning
confidence: 99%
“… Kiselev, Andrews & Hemberg (2019) suggest that Seurat may be inappropriate for small scRNA-Seq datasets, due to the inherent limitations of the Louvain algorithm. On the contrary, as a way to amend for the limitations of k-means clustering algorithm used in SC3, the authors implemented an extensive iterative-consensus approach, which makes SC3 magnitudes slower than Seurat and downgrades its scalability ( Duò, Robinson & Soneson, 2018 ; Kiselev, Andrews & Hemberg, 2019 ). Another difference between these two packages is that Seurat does not include functionality to estimate or explicitly specify cluster number, while SC3 does.…”
Section: Discussionmentioning
confidence: 99%
“…A difficulty with single-cell RNA-Seq data is its high cell-to-cell variation due to low sampling depth and transcriptional bursting [4], which makes it challenging to extract useful information when comparing the transcriptomes of individual cells. To remedy this, a variety of cell clustering algorithms have been developed [5][6][7], which reduce variation by enabling comparisons between cell populations (defined here as collections of single cells with associated count matrices) instead of individual cells.…”
Section: Introductionmentioning
confidence: 99%
“…The purpose of clustering is to create cell populations based on biological traits, such as cell type or different cell states. Despite the variety of clustering algorithms available, misclassification of cells, where some cells assigned to a cluster exhibit a greater biological difference from the others, is still a large problem [6][7][8]. The difficulty increases when trying to separate more biologically similar cells, such as subsets of B cells, since in such cases the technical noise becomes relatively higher compared to the biological variation.…”
Section: Introductionmentioning
confidence: 99%