A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis

Murphy, Alan E; Skene, Nathan G.

doi:10.1038/s41467-022-35519-4

Cited by 47 publications

(42 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This small effective sample size of "large" datasets can introduce inflated false positive rates (eg, differentially expressed genes between cases and controls) when trying to reproduce the results in other datasets derived from different individuals using traditional statistics. [170][171][172] Here too, ML can be of great utility. Resources like the scArches database 38 store models previously trained on one or more datasets (eg, an unsupervised dimensionality reduction model trained on a large single-cell dataset of cell lines from controls).…”

Section: Unsupervised ML Approachesmentioning

confidence: 99%

Artificial intelligence for neurodegenerative experimental models

Marzi,

Schilder,

Nott

et al. 2023

Alzheimer's & Dementia

View full text Add to dashboard Cite

INTRODUCTIONExperimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials.METHODSHere we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research.RESULTSConsidering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross‐model reproducibility and translation to human biology, while sustaining biological interpretability.DISCUSSIONAI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data.Highlights There are increasing applications of AI in experimental medicine. We identified issues in reproducibility, cross‐species translation, and data curation in the field. Our review highlights data resources and AI approaches as solutions. Multi‐omics analysis with AI offers exciting future possibilities in drug discovery.

show abstract

Section: Unsupervised ML Approachesmentioning

confidence: 99%

Artificial intelligence for neurodegenerative experimental models

Marzi,

Schilder,

Nott

et al. 2023

Alzheimer's & Dementia

View full text Add to dashboard Cite

show abstract

“…The authors identified 1,031 DEGs using this combinatorial approach – DEGs requiring an FDR <0.01 in the cell-level and an FDR<0.05 in the patient level analysis. It is important to note that this cell-level differential expression approach, also known as pseudoreplication, over-estimates the confidence in DEGs due to the statistical dependence between cells from the same patient not being considered 11,12,13,14 . When we inspect all DEGs identified at an FDR of 0.05 from the authors’ cell-level analysis, this number increases to 14,274.…”

Section: Mainmentioning

confidence: 99%

“…When we inspect all DEGs identified at an FDR of 0.05 from the authors’ cell-level analysis, this number increases to 14,274. Pseudobulk differential expression (DE) analysis has recently been proven to give optimal performance compared to both mixed models and pseudoreplication approaches 11,12,15,16 . It aggregates counts to individuals thus accounting for the dependence between an individual’s cells.…”

Section: Mainmentioning

confidence: 99%

Avoiding false discoveries: Revisiting an Alzheimer’s disease snRNA-Seq dataset

Murphy

Fancy

Skene

2023

Preprint

Self Cite

View full text Add to dashboard Cite

Arising From: Mathys, H. et al. Nature (2019). https://doi.org/10.1038/s41586–019–1195–2 Mathys et al., conducted the first single-nucleus RNA-Seq study (snRNA–Seq) of Alzheimer′s disease (AD). The authors profiled the transcriptomes of approximately 80,000 cells from the prefrontal cortex, collected from 48 individuals – 24 of which presented with varying degrees of AD pathology. With bulk RNA-Seq, changes in gene expression across cell types can be lost, potentially masking the differentially expressed genes (DEGs) across different cell types. Through the use of single–cell techniques, the authors benefitted from increased resolution with the potential to uncover cell type–specific DEGs in AD for the first time. However, there were limitations in both their data processing and quality control and their differential expression analysis. Here, we correct these issues with best–practice approaches to snRNA–Seq processing and differential expression, resulting 892 times fewer differentially expressed genes at a false discovery rate (FDR) of 0.05.

show abstract

“…If two tests have different type 1 error rates, the MCC can favor a test which fails to control the type 1 error rate. In fact, as illustrated in Figure 1 of Murphy and Skene's manuscript, failing to account for the within-sample correlation causes very high type 1 error rates (>0.50) and yet yields high MCC values 5 . Conversely, as observed from Eq.…”

mentioning

confidence: 99%

“…In addition to the limitations of MCC and ROC curves for comparing hypothesis tests, there are several misleading components we would like to address 5 . First, in Supplementary Figure 1 of Murphy and Skene, the two-part hurdle model is missing and adding it would reveal how much closer the two-part hurdle model is to the nominal p-value than the pseudobulk methods.…”

mentioning

confidence: 99%