2021
DOI: 10.1038/s41592-021-01336-8
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking atlas-level data integration in single-cell genomics

Abstract: Single-cell atlases often include samples that span locations, laboratories and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration. To guide integration method choice, we benchmarked 68 method and preprocessing combinations on 85 batches of gene expression, chromatin accessibility and simulation data from 23 publications, altogether representing >1.2 million cells distributed in 13 atlas-level integration tasks. We evaluate… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

10
862
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 691 publications
(1,064 citation statements)
references
References 56 publications
10
862
1
Order By: Relevance
“…Recent efforts to benchmark batch correction methods for single-cell RNA-seq all came to the conclusion that no single method emerges as the best performer in every dataset [ 23 , 33 , 34 ]. They also showed the importance of the selection of highly variable genes and the limitation of different types of outputs on the downstream analysis.…”
Section: Discussionmentioning
confidence: 99%
“…Recent efforts to benchmark batch correction methods for single-cell RNA-seq all came to the conclusion that no single method emerges as the best performer in every dataset [ 23 , 33 , 34 ]. They also showed the importance of the selection of highly variable genes and the limitation of different types of outputs on the downstream analysis.…”
Section: Discussionmentioning
confidence: 99%
“…To normalize and scale the single-cell gene expression data, we imported the filtered single cells and the UMI count matrices into the Seurat (v3.1.5) R package (v3.5.2, https://www.R-project.org/ ) ( Satija et al, 2015 ; Butler et al, 2018 ). By the Seurat “FindVariableGenes” function, we determined the highly variable genes (HGVs) across the single cells in each sample ( Luecken et al, 2021 ). Following principal component analysis (PCA), the first 20 principal components were selected for cell clustering and dimension reduction visualized by t-SNE and UMAP plots at a resolution of 0.5 ( Supplementary Figure S3 ).…”
Section: Methodsmentioning
confidence: 99%
“…As DL models for single-cell data analysis have not matured, it may be valuable to run multiple tools to see how they compare. Furthermore, comprehensive single-cell DL benchmarking papers help users choose the best model 8 , 9 .…”
Section: Best Practices In Applying Deep Learning In Single-cell Biologymentioning
confidence: 99%