2019
DOI: 10.1101/765628
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells

Abstract: Motivation: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. Results:We introduce a highly scalable graph-based clustering algorithm PARCphenotyping by accelerated refined community-partitioning -for ultralarge-scale, high-dimensional single-cell data (> 1 million cells). Using large single cell mas… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 60 publications
0
24
0
Order By: Relevance
“…Then principal component analysis (PCA) was performed utilizing the HVGs and Harmony algorithm was used to remove batch effects 25 . We used the PARC approach to identify clusters 37 and selected features by "FeatureSelec-tionByEnrichment" function from cytograph2 algorithm 38 , followed by another round of PCA, Harmony, and PARC. Subsequently, we calculated K nearest neighbors in a KNN graph, performed uniform manifold approximation and projection (UMAP) by Pegasuspy 39 , and identified clusters by PARC.…”
Section: Scrna-seq Data Analysis and Statisticsmentioning
confidence: 99%
“…Then principal component analysis (PCA) was performed utilizing the HVGs and Harmony algorithm was used to remove batch effects 25 . We used the PARC approach to identify clusters 37 and selected features by "FeatureSelec-tionByEnrichment" function from cytograph2 algorithm 38 , followed by another round of PCA, Harmony, and PARC. Subsequently, we calculated K nearest neighbors in a KNN graph, performed uniform manifold approximation and projection (UMAP) by Pegasuspy 39 , and identified clusters by PARC.…”
Section: Scrna-seq Data Analysis and Statisticsmentioning
confidence: 99%
“…26 We varied the number of components from 5 to 115. We applied FAUST, DEPECHE, 27 flowMeans, 10,28 FlowSOM, 6,29 k-means, PARC, 30 and Pheno-Graph 31,32 to the simulated datasets (supplemental experimental procedures A.7 shows an example of the simulated data). We provided k-means with the true number of clusters in each simulation iteration, a required parameter setting.…”
Section: Faust Resolves High-dimensional Structure In Simulation Studiesmentioning
confidence: 99%
“…Algorithm. VIA first represents the single-cell data as a cluster graph (i.e., each node is a cluster of single cells), computed by our recently developed data-driven community-detection algorithm, PARC, which allows scalable clustering whilst preserving global properties of the topology needed for accurate TI 14 (Step 1 in Fig. 1).…”
Section: Resultsmentioning
confidence: 99%
“…VIA first represents the singlecell data in a k-nearest-neighbor (KNN) graph where each node is a cluster of single cells. The clusters are computed by our recently developed clustering algorithm, PARC 14 . In brief, PARC is built on hierarchical navigable small world 46 accelerated KNN graph construction and a fast community-detection algorithm (Leiden method 47 ), which is further refined by data-driven pruning.…”
Section: Discussionmentioning
confidence: 99%