2022
DOI: 10.1038/s41467-022-35519-4
|View full text |Cite
|
Sign up to set email alerts
|

A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
41
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 47 publications
(42 citation statements)
references
References 9 publications
1
41
0
Order By: Relevance
“…This small effective sample size of "large" datasets can introduce inflated false positive rates (eg, differentially expressed genes between cases and controls) when trying to reproduce the results in other datasets derived from different individuals using traditional statistics. [170][171][172] Here too, ML can be of great utility. Resources like the scArches database 38 store models previously trained on one or more datasets (eg, an unsupervised dimensionality reduction model trained on a large single-cell dataset of cell lines from controls).…”
Section: Unsupervised ML Approachesmentioning
confidence: 99%
“…This small effective sample size of "large" datasets can introduce inflated false positive rates (eg, differentially expressed genes between cases and controls) when trying to reproduce the results in other datasets derived from different individuals using traditional statistics. [170][171][172] Here too, ML can be of great utility. Resources like the scArches database 38 store models previously trained on one or more datasets (eg, an unsupervised dimensionality reduction model trained on a large single-cell dataset of cell lines from controls).…”
Section: Unsupervised ML Approachesmentioning
confidence: 99%
“…The authors identified 1,031 DEGs using this combinatorial approach – DEGs requiring an FDR <0.01 in the cell-level and an FDR<0.05 in the patient level analysis. It is important to note that this cell-level differential expression approach, also known as pseudoreplication, over-estimates the confidence in DEGs due to the statistical dependence between cells from the same patient not being considered 11,12,13,14 . When we inspect all DEGs identified at an FDR of 0.05 from the authors’ cell-level analysis, this number increases to 14,274.…”
Section: Mainmentioning
confidence: 99%
“…When we inspect all DEGs identified at an FDR of 0.05 from the authors’ cell-level analysis, this number increases to 14,274. Pseudobulk differential expression (DE) analysis has recently been proven to give optimal performance compared to both mixed models and pseudoreplication approaches 11,12,15,16 . It aggregates counts to individuals thus accounting for the dependence between an individual’s cells.…”
Section: Mainmentioning
confidence: 99%
“…If two tests have different type 1 error rates, the MCC can favor a test which fails to control the type 1 error rate. In fact, as illustrated in Figure 1 of Murphy and Skene's manuscript, failing to account for the within-sample correlation causes very high type 1 error rates (>0.50) and yet yields high MCC values 5 . Conversely, as observed from Eq.…”
mentioning
confidence: 99%
“…In addition to the limitations of MCC and ROC curves for comparing hypothesis tests, there are several misleading components we would like to address 5 . First, in Supplementary Figure 1 of Murphy and Skene, the two-part hurdle model is missing and adding it would reveal how much closer the two-part hurdle model is to the nominal p-value than the pseudobulk methods.…”
mentioning
confidence: 99%