2019
DOI: 10.1101/642595
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Abstract: Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but large-scale scRNA-seq datasets require long computational times and a large memory capacity.In this work, we review 21 fast and memory-efficient PCA implementations (10 algorithms) and evaluate their application using 4 real and 18 synthetic datasets. Our benchmarking showed that some PCA algorithms are faster, more memory efficient, and more accurate than others. In consideration of the differ… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
47
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(48 citation statements)
references
References 130 publications
1
47
0
Order By: Relevance
“…Since the various PCA approaches and implementations were recently benchmarked in a similar 252 context [17] , we focused on widely used approaches that had not yet been compared: Seurat's PCA, 253 scran's denoisePCA, and GLM-PCA [40] . When relevant, we combined them with sctransform 254 normalization.…”
Section: Dimensionality Reduction 251mentioning
confidence: 99%
See 2 more Smart Citations
“…Since the various PCA approaches and implementations were recently benchmarked in a similar 252 context [17] , we focused on widely used approaches that had not yet been compared: Seurat's PCA, 253 scran's denoisePCA, and GLM-PCA [40] . When relevant, we combined them with sctransform 254 normalization.…”
Section: Dimensionality Reduction 251mentioning
confidence: 99%
“…Another missing aspect of current 22 benchmarking studies is their limitation to capture all aspects of scRNAseq processing workflow. 23 Although previous benchmarks already brought valuable recommendations for data processing, 24 some only focused on one aspect of data processing (e.g., [14] ), did not evaluate how the tool se-25 lection affects downstream analysis (e.g., [17] ) or did not tackle all aspects of data processing, such 26 as doublet identification or cell filtering (e.g., [18] ). A thorough evaluation of the tools covering all 27 major processing steps is however urgently needed as previous benchmarking studies highlighted 28 that a combination of tools can have a drastic impact on downstream analysis, such as differential 29 expression analysis and cell-type deconvolution [18,3] .…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…User-defined parameters for unsupervised algorithms often present themselves as "black-box" knobs with unknown consequences. Tuning these parameters can be a daunting task for the single-cell analyst, but is known to be crucial to algorithm performance (Belkina et al, 2018;Kobak and Berens, 2019;Tsuyuzaki et al, 2019).…”
Section: Parameter Optimization Plays Key Role In Structural Preservamentioning
confidence: 99%
“…Numerical and computational methods for dimensionality reduction have been developed to reconstruct underlying distributions from native "gene space" and provide low-dimensional, latent representations of singlecell data for more intuitive downstream interpretation. Basic clustering methods and linear transformations such as principal component analysis (PCA) have proven to be valuable tools in this field (Sorzano, Vargas and Montano, 2014; Levine et al, 2015;Kiselev et al, 2017;Tsuyuzaki et al, 2019). However, given the distribution and sparsity of scRNA-seq data, complex, nonlinear transformations are often required to capture and visualize expression patterns.…”
Section: Introductionmentioning
confidence: 99%