2019
DOI: 10.1073/pnas.1820006116
|View full text |Cite
|
Sign up to set email alerts
|

scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets

Abstract: Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
170
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 154 publications
(173 citation statements)
references
References 61 publications
3
170
0
Order By: Relevance
“…We used scMerge [21] to remove the differences between two batches in the PBMC data collection [9]. PBMC dataset generated from two samples using five different protocols and then construct the learning curve.…”
Section: Learning Curve Construction With Data Integrationmentioning
confidence: 99%
See 1 more Smart Citation
“…We used scMerge [21] to remove the differences between two batches in the PBMC data collection [9]. PBMC dataset generated from two samples using five different protocols and then construct the learning curve.…”
Section: Learning Curve Construction With Data Integrationmentioning
confidence: 99%
“…We used scMerge [21] to remove batch effects between two samples in the PBMC data collection generated from five different protocols [9] and next constructed the learning curves. We found that in most of the cases (SMART-seq, CEL-seq, and inDrops), the learning curve based on batch-corrected data achieved a higher accuracy rate that the uncorrected data (Supplementary Fig.…”
Section: Learning Curve Construction With Data Integrationmentioning
confidence: 99%
“…CFD is also able to identify cases in which none of the input fits the pattern of consistently high values. Previous methods such as scMerge [14] rank features by some conservation metric and pick the top n% as the conserved features, for some user-specified n. When no input features are truly consistently high, an approach like this will simply result in a list of false positives. In contrast, we find that CFD returns no statistically significant features when all input features are either high mean and high variance, or low mean and low variance, across 14 different parameter settings (Figure 2b)).…”
Section: Simulation Datamentioning
confidence: 99%
“…Identification of features in single-cell data that are stable across cells has recently become an important problem as single-cell data becomes more available and prevalent in genomic and epigenomic analyses. There have been a few methods developed specifically for the discovery of so-called stably expressed genes (scSEGs) in single-cell RNA sequencing (scRNAseq) [10,14]. Recently, a method called scMerge [14] used a Gamma-Gaussian mixture model to compute certain characteristics related to stability that are then used to rank genes by an "SEG index," which is the average rank across these stability properties.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation