2019
DOI: 10.1101/562082
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNAsequencing data

Abstract: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNAseq) data includes automated computational steps like data normalization, dimensionality reduction and cell clustering. However, assigning cell type labels to cell clusters is still conducted manually by most researchers, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. Two bottlenecks to automating this task are the scarcity of reference cell type gene expression s… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 24 publications
0
7
0
Order By: Relevance
“…In contrast to the extensive evaluations of clustering, differential expression, and trajectory inference methods [10][11][12], there is currently only a single attempt comparing methods to assign cell type labels to cell clusters [13]. The lack of a comprehensive comparison of scRNA-seq classification methods leaves users without indications as to which classification method best fits their problem.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to the extensive evaluations of clustering, differential expression, and trajectory inference methods [10][11][12], there is currently only a single attempt comparing methods to assign cell type labels to cell clusters [13]. The lack of a comprehensive comparison of scRNA-seq classification methods leaves users without indications as to which classification method best fits their problem.…”
Section: Introductionmentioning
confidence: 99%
“…The data processing aspect, depending on whether to aggregate data to the subpopulation-sample level, is described in the schematic in Figure 1d. The methods presented here are modular and thus the subpopulation label could originate from an earlier step in the analysis, such as clustering [40,41,42] after integration [43,9] or after inference of cell-type labels at the subpopulation- [10] or cell-level [11] . The specific details and suitability of these various preprocessing steps is an active area of current research and a full evaluation of them is beyond the scope of the current work; a comprehensive review was recently made available [44] .…”
Section: Resultsmentioning
confidence: 99%
“…In our framework, a subpopulation is simply a set of cells deemed to be similar enough to be considered as a group and where it is of interest to interrogate such sets of similarly-defined cells across multiple samples and conditions. Therefore, cells from a scRNA-seq experiment are first organized into subpopulations, e.g., by integrating the multiple samples together [9] and clustering or applying a subpopulation-level assignment algorithm [10] or cell-level prediction [11] ; clustering and manual annotation is also an option. Regardless of the mode or the uncertainty in subpopulation assignment, the discovery framework we describe provides a basis for biological interpretation and a path to discovering interesting expression patterns within subpopulations across samples.…”
Section: Introductionmentioning
confidence: 99%
“…Top-k accuracy gain from probabilistic method Both the CellMeSH database and the probabilistic query method contribute to the overall top-k accuracy gains of CellMeSH. To isolate the contribution of the probabilistic query method to the overall CellMeSH performance, we compared it to the more established hypergeometric test [23] and GSVA [24] that are suggested in [60], by querying the same CellMeSH database for the three mouse datasets.…”
Section: Performance Of Top-k Accuracymentioning
confidence: 99%
“…In order to use a hypergeometric test to query a weighted database with a set of genes, the database first needs to be binarized [60]. We binarize the noisy CellMeSH database by setting the raw genecell count values in the database to 0 if the count is below a threshold (e.g.…”
Section: Hypergeometric Testmentioning
confidence: 99%