<i>Cytocipher</i>determines significantly different populations of cells in single cell RNA-seq data

Balderson, Brad; Thor, Stefan; Bodén, Mikael

doi:10.1101/2022.08.12.503759

Cited by 1 publication

(1 citation statement)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, using published cell type annotations as a stand-in for ground truth cell identities risks biasing benchmarking results in favor of methods similar to those used in the original analysis. We therefore used complementary strategies in a three-tiered benchmarking approach to comprehensively compare the performance of CHOIR with that of 14 other unsupervised clustering methods 5,6,9,10,[16][17][18][19][20][21][22][23][24][25] (Supplementary Table 3) in the analysis of both simulated and real data (Supplementary Fig. 1c and Supplementary Table 1).…”

Section: Benchmarking Resultsmentioning

confidence: 99%

CHOIR improves significance-based detection of cell types and states from single-cell data

Petersen,

Mucke,

Ryan Corces

2024

Preprint

View full text Add to dashboard Cite

Clustering is a critical step in the analysis of single-cell data, as it enables the discovery and characterization of putative cell types and states. However, most popular clustering tools do not subject clustering results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (clustering hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine which clusters represent distinct populations. We demonstrate the enhanced performance of CHOIR through extensive benchmarking against 14 existing clustering methods across 100 simulated and 4 real single-cell RNA-seq, ATAC-seq, spatial transcriptomic, and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable, and robust solution to the important challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.

show abstract