2020
DOI: 10.1101/2020.11.17.387795
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured

Abstract: In the burgeoning field of single-cell transcriptomics, a pressing challenge is to benchmark various experimental protocols and numerous computational methods in an unbiased manner. Although dozens of simulators have been developed for single-cell RNA-seq (scRNA-seq) data, they lack the capacity to simultaneously achieve all the three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill in this gap, here we propose scDesign2, an interp… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
51
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 24 publications
(51 citation statements)
references
References 132 publications
(208 reference statements)
0
51
0
Order By: Relevance
“…The source code and data for reproducing the results are available at https://doi.org/10.5281/zenodo.4011311 [119]. Both the R package and the source code are under the MIT license.…”
Section: Supplementary Informationmentioning
confidence: 99%
“…The source code and data for reproducing the results are available at https://doi.org/10.5281/zenodo.4011311 [119]. Both the R package and the source code are under the MIT license.…”
Section: Supplementary Informationmentioning
confidence: 99%
“…In the first case study, we use the Zheng8 dataset (measured by the 10x protocol) as the reference dataset. To generate the pseudo targeted gene profiling data, we use a new single-cell gene expression simulator that captures gene correlations, scDesign2 [39], to generate data with a 100-time higher per-cell sequencing depth. In the second case study, we use the PBMC10x dataset (measured by 10x protocol) as the reference dataset, and we use PBMCSmartseq (measured by Smart-Seq2) as the pseudo targeted gene profiling data because Smart-Seq2 has a higher pergene sequencing depth than 10x does.…”
Section: Resultsmentioning
confidence: 99%
“…We use two datasets, Zheng8 and PBMC10x, as the reference scRNA-seq datasets. For Zheng8 dataset, we first use scDesign2 [39] to learn the underlying parameters, and then simulate a new dataset with same genes and cell types but 100 times higher sequencing depth compared to the Zheng8 dataset. For PBMC10x dataset, we use the PBMCSmartSeq dataset, which measures the exact same example and contains all genes measured in PBMC10x.…”
Section: Supplementary Materialsmentioning
confidence: 99%
“…We compared Clipper (Additional File 1: Section S7.4) with edgeR [4], MAST [56], Monocle3 [57], the two-sample t test, and the Wilcoxon rank-sum test (Additional File 1: Section S5.4), five methods that are either popular or reported to have comparatively top performance from a previous benchmark study [58]. To verify the FDR control, we used scDesign2, a flexible probabilistic simulator, to generate scRNA-seq count data with known true DEGs [59]. scDesign2 offers three key advantages that en-able the generation of realistic semi-synthetic scRNA-seq count data: (1) it captures distinct marginal distributions of different genes; (2) it preserves gene-gene correlations; (3) it adapts to various scRNA-seq protocols.…”
Section: Resultsmentioning
confidence: 99%