2020
DOI: 10.3389/fonc.2020.00973
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data

Abstract: Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(24 citation statements)
references
References 72 publications
(103 reference statements)
0
24
0
Order By: Relevance
“…2014 ). Likewise, transcriptomes of single cells are often nonindependent due to spatial or temporal autocorrelation, which might create similar artifacts such as the horseshoe effect ( Edelaar 2013 ; Hsu and Culhane 2020 ). Although we did not simulate the evolution of optimized traits, our results suggest that, even when the phenotypes are Pareto optimal, the Pareto front and archetypes identified by ParTI may be biased by the phylogenetic relationship or population structure in the data.…”
Section: Discussionmentioning
confidence: 99%
“…2014 ). Likewise, transcriptomes of single cells are often nonindependent due to spatial or temporal autocorrelation, which might create similar artifacts such as the horseshoe effect ( Edelaar 2013 ; Hsu and Culhane 2020 ). Although we did not simulate the evolution of optimized traits, our results suggest that, even when the phenotypes are Pareto optimal, the Pareto front and archetypes identified by ParTI may be biased by the phylogenetic relationship or population structure in the data.…”
Section: Discussionmentioning
confidence: 99%
“…Critically, one recent report has sparked interest by suggesting that adding nonlinear interactions to data modelling may provide negligible improvement over linear models when comparing brain-behavior interactions [130]. In selecting the best models (linear or nonlinear) might also depend on data preprocessing [131,132]. The effectiveness of data preprocessing may vary depending on the characteristics of the data domains.…”
Section: Discussionmentioning
confidence: 99%
“…Because of the size and complexity of the data, the datasets are orders of magnitude greater than those encountered when analyzing "bulk" RNAseq data from tissue samples. While such fine resolution data have the potential to reveal new biological findings, scRNAseq data exhibit sparsity, noisiness, and technical artefacts beyond those seen for bulk RNA samples (1,2), necessitating scRNAseq specific pre-processing and normalization (3,4). Typically scRNAseq analysis includes the use of dimension reduction, as it attenuates noise and ensures computational tractability, but the choice of method considerably influences downstream analyses, results, and conclusions (3,5).…”
Section: Introductionmentioning
confidence: 99%
“…As such, use of principal component analysis (PCA) requires that discrete and sparse scRNAseq count data be transformed prior to dimension reduction with this method (6). PCA is a linear dimension reduction method that obtains a low-dimensional data representation along orthogonal linear axes such that the proportion of variance accounted on each axis is maximized in Euclidean space (4,(8)(9)(10)(11). Because PCA is most suitable for continuous data that is approximately normally distributed, it may exhibit artefacts when applied to data with gradients or noncontinuous data (such as counts); one such artefact, called the "arch" or "horseshoe" effect, has been found to occur when PCA is applied to scRNAseq data without log-transformation (4,6,12).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation