2021
DOI: 10.1038/s41467-021-21038-1
|View full text |Cite
|
Sign up to set email alerts
|

A practical solution to pseudoreplication bias in single-cell studies

Abstract: Cells from the same individual share common genetic and environmental backgrounds and are not statistically independent; therefore, they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within-sample correlation… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
207
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 174 publications
(211 citation statements)
references
References 45 publications
4
207
0
Order By: Relevance
“…One potential disadvantage of this strategy is that it cannot adjust for cell-level covariates such as mitochondrial and ribosomal gene concentration to increase the statistical power. The pseudo-bulk method might also be underpowered when the CPS values are imbalanced across the subjects 41 .…”
Section: Discussionmentioning
confidence: 99%
“…One potential disadvantage of this strategy is that it cannot adjust for cell-level covariates such as mitochondrial and ribosomal gene concentration to increase the statistical power. The pseudo-bulk method might also be underpowered when the CPS values are imbalanced across the subjects 41 .…”
Section: Discussionmentioning
confidence: 99%
“…The statistical analysis was performed using the limma package in R 64 , using the default configuration and the following linear model: ~pathology+nFeature+pc_mito, where pathology is the average immunohistochemistry quantification value for Aβ or pTau, nFeatures is the total number of distinct features expressed in each nucleus (to account for the fact that nuclei that express a higher number of features may have higher AUCell scores) and pc_mito is the percentage of counts mapping to mitochondrial genes. We also corrected for a potential pseudoreplication bias 65 , by using the duplicateCorrelation function of the limma package with the sample as the "blocking" variable.…”
Section: Gene Co-expression (Module) Regulatory Network (Regulon) and Enrichment Analysesmentioning
confidence: 99%
“…Log2, CPM), and co-variate scaling and centering. The default DGE method in scFlow is a generalized linear mixed model (GLMM) with a random effect (RE) term (e.g., to account for correlations within individual samples) as implemented within the model-based analysis of single-cell transcriptomics (MAST) algorithm (Zimmerman et al, 2021;Finak et al, 2015). An interactive DGE HTML report with a volcano plot and searchable tables is generated, including details of model parameters, inputs, and outputs (File 5).…”
Section: Differential Gene Expression and Impacted Pathway Analysismentioning
confidence: 99%
“…A log2(TPM + 1) expression matrix is calculated from the raw counts matrix, and a two-part (i.e., including a discrete logistic regression component for expression rate and a continuous Gaussian component conditioned on each cell expressing a gene) generalized regression model is fit independently for each gene. The CDR is included as a covariate alongside additional user-specified experimental covariates, which can include, for example, the individual sample as a random effect (Zimmerman et al, 2021). False-discovery rate (FDR) adjusted p-values are determined using the Benjamini & Hochberg method.…”
Section: Differential Gene Expressionmentioning
confidence: 99%