Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Tsuyuzaki, Koki; Sato, Hiroyuki; Sato, Kenta; Nikaido, Itoshi

doi:10.1101/642595

Cited by 37 publications

(48 citation statements)

References 130 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the various PCA approaches and implementations were recently benchmarked in a similar 252 context [17] , we focused on widely used approaches that had not yet been compared: Seurat's PCA, 253 scran's denoisePCA, and GLM-PCA [40] . When relevant, we combined them with sctransform 254 normalization.…”

Section: Dimensionality Reduction 251mentioning

confidence: 99%

“…Another missing aspect of current 22 benchmarking studies is their limitation to capture all aspects of scRNAseq processing workflow. 23 Although previous benchmarks already brought valuable recommendations for data processing, 24 some only focused on one aspect of data processing (e.g., [14] ), did not evaluate how the tool se-25 lection affects downstream analysis (e.g., [17] ) or did not tackle all aspects of data processing, such 26 as doublet identification or cell filtering (e.g., [18] ). A thorough evaluation of the tools covering all 27 major processing steps is however urgently needed as previous benchmarking studies highlighted 28 that a combination of tools can have a drastic impact on downstream analysis, such as differential 29 expression analysis and cell-type deconvolution [18,3] .…”

mentioning

confidence: 99%

“…In addition, we did 403 not compare any of the alignment and/or quantification methods used to obtain the count matrix, 404 which was for instance discussed in [18] . Some steps, such as the implementation of the PCA, were 405 also not explored in detail here as they have already been the object of recent and thorough study 406 13 elsewhere [14,17] . We also considered only methods relying on Euclidean distance, while correlation 407 was recently reported to be superior [47] and would require further investigation.…”

mentioning

confidence: 99%

See 2 more Smart Citations

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

Germain

Sonrel

Robinson

2020

Preprint

View full text Add to dashboard Cite

The massive growth of single-cell RNA-sequencing (scRNAseq) and methods for its analysis still 1 lacks sufficient and up-to-date benchmarks that would guide analytical choices. Moreover, current 2 studies are often focused on isolated steps of the process. Here, we present a flexible R framework 3 for pipeline comparison with multi-level evaluation metrics and apply it to the benchmark of 4 scRNAseq analysis pipelines using datasets with known cell identities. We evaluate common steps 5 of such analyses, including filtering, doublet detection, normalization, feature selection, denoising, 6 dimensionality reduction and clustering. On the basis of these analyses, we make a number of 7 concrete recommendations about analysis choices. The evaluation framework, pipeComp, has 8 been implemented so as to easily integrate any other step or tool, allowing extensible benchmarks 9 and easy application to other fields (https://github.com/plger/pipeComp). 10 Background 11 Single-cell RNA-sequencing (scRNAseq) and the set of attached analysis methods are evolving 12 fast, with more than 560 software tools available to the community [1] , roughly half of which are 13 dedicated to tasks related to data processing such as clustering, ordering, dimension reduction 14 or normalization. This increase in the number of available tools follows the development of new 15 sequencing technologies and the growing number of reported cells, genes and cell populations [2] . 16 As data processing is a critical step in any scRNAseq analysis, affecting downstream analysis and 17 interpretation, it is critical to evaluate the available tools. 18A number of good comparison and benchmark studies have already been performed on vari-19 ous steps related to scRNAseq processing and analysis and can guide the choice of methodology 20 ] ). However these recommendations need constant up-21 dating and often leave open many details of an analysis. Another missing aspect of current 22 benchmarking studies is their limitation to capture all aspects of scRNAseq processing workflow. 23 Although previous benchmarks already brought valuable recommendations for data processing, 24 some only focused on one aspect of data processing (e.g., [14] ), did not evaluate how the tool se-25 lection affects downstream analysis (e.g., [17] ) or did not tackle all aspects of data processing, such 26 as doublet identification or cell filtering (e.g., [18] ). A thorough evaluation of the tools covering all 27 major processing steps is however urgently needed as previous benchmarking studies highlighted 28 that a combination of tools can have a drastic impact on downstream analysis, such as differential 29 expression analysis and cell-type deconvolution [18,3] . It is then critical to evaluate not only the 30 single effect of a preprocessing method but also its positive or negative interaction with all parts 31 of a workflow. 32Here, we develop a flexible R framework for pipeline comparison and evaluate the various 33 steps of analysis leading from an initial count matrix to a clu...

show abstract

Section: Dimensionality Reduction 251mentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

Germain

Sonrel

Robinson

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…User-defined parameters for unsupervised algorithms often present themselves as "black-box" knobs with unknown consequences. Tuning these parameters can be a daunting task for the single-cell analyst, but is known to be crucial to algorithm performance (Belkina et al, 2018;Kobak and Berens, 2019;Tsuyuzaki et al, 2019).…”

Section: Parameter Optimization Plays Key Role In Structural Preservamentioning

confidence: 99%

“…Numerical and computational methods for dimensionality reduction have been developed to reconstruct underlying distributions from native "gene space" and provide low-dimensional, latent representations of singlecell data for more intuitive downstream interpretation. Basic clustering methods and linear transformations such as principal component analysis (PCA) have proven to be valuable tools in this field (Sorzano, Vargas and Montano, 2014; Levine et al, 2015;Kiselev et al, 2017;Tsuyuzaki et al, 2019). However, given the distribution and sparsity of scRNA-seq data, complex, nonlinear transformations are often required to capture and visualize expression patterns.…”

Section: Introductionmentioning

confidence: 99%

A quantitative framework for evaluating single-cell data structure preservation by dimensionality reduction techniques

Heiser

Lau

2019

Preprint

View full text Add to dashboard Cite

SummaryHigh-dimensional data, such as those generated using single-cell RNA sequencing, present challenges in interpretation and visualization. Numerical and computational methods for dimensionality reduction allow for low-dimensional representation of genome-scale expression data for downstream clustering, trajectory reconstruction, and biological interpretation. However, a comprehensive and quantitative evaluation of the performance of these techniques has not been established. We present an unbiased framework that defines metrics of global and local structure preservation in dimensionality reduction transformations. Using discrete and continuous scRNA-seq datasets, we find that input cell distribution and method parameters are largely determinant of global, local, and organizational data structure preservation by eleven published dimensionality reduction methods. Code available at github.com/KenLauLab/DR-structure-preservation allows for rapid evaluation of further datasets and methods.

show abstract

Analysis of nucleoid‐associated protein‐binding regions reveals DNA structural features influencing genome organization in Mycobacterium tuberculosis

et al. 2021

View full text Add to dashboard Cite

Nucleoid‐associated proteins (NAPs) maintain bacterial nucleoid configuration through their architectural properties of DNA bending, wrapping, and bridging. However, the contribution of DNA structural alterations to DNA‐NAP recognition at the genomic scale remains unresolved. Present work dissects the DNA sequence, shape and altered structural preferences at a genomic scale for six NAPs in Mycobacterium tuberculosis. Results suggest narrower minor groove width (MGW) and higher DNA rigidity are marked for the binding sites of EspR and Lsr2, while mIHF, MtHU and NapM have heterogeneous DNA structural predilections. In contrast, WhiB4–DNA‐binding sites were characterized by wider MGW, highly deformable and less curved DNA. This work provides systematic insight into NAP‐mediated genome organization as a function of DNA structural features.

show abstract

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Cited by 37 publications

References 130 publications

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

A quantitative framework for evaluating single-cell data structure preservation by dimensionality reduction techniques

Analysis of nucleoid‐associated protein‐binding regions reveals DNA structural features influencing genome organization in Mycobacterium tuberculosis

Contact Info

Product

Resources

About