2023
DOI: 10.1186/s13059-023-02904-1
|View full text |Cite|
|
Sign up to set email alerts
|

The shaky foundations of simulating single-cell RNA sequencing data

Abstract: Background With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(9 citation statements)
references
References 114 publications
0
9
0
Order By: Relevance
“…We use the splat simulation model implemented in the splatter R package to simulate scRNA-seq datasets [ 48 ]. The splat model is the most widely used simulation model in scRNA-seq benchmarking studies [ 49 ] and was recently shown to be in the top tier of simulation methods [ 50 ]. To simulate marker genes specifically, we designed and implemented a novel marker gene score to select and rank marker genes using the splat model’s cluster-specific differential expression parameters ( Methods ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use the splat simulation model implemented in the splatter R package to simulate scRNA-seq datasets [ 48 ]. The splat model is the most widely used simulation model in scRNA-seq benchmarking studies [ 49 ] and was recently shown to be in the top tier of simulation methods [ 50 ]. To simulate marker genes specifically, we designed and implemented a novel marker gene score to select and rank marker genes using the splat model’s cluster-specific differential expression parameters ( Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…A limitation of the splat model is that it does not allow these cluster-specific differential expression parameters to be estimated from data. Indeed, no current simulation scRNA-seq method incorporates this facility [ 49 ]. To overcome this limitation in practice we performed analyses comparing simulated and true marker genes to identify values of the parameters that were able to recapitulate true expert-annotated marker genes (Additional file 1 : Figs.…”
Section: Resultsmentioning
confidence: 99%
“…The simulated datasets were generated using the SPARsim [7] package in R [8], which creates count matrices resembling real data by modeling the distribution of zeros with a Gamma-Multivariate hypergeometric distribution. In evaluations by Crowell et al [9] and Cao et al [10], SPARsim emerged as one of the top performing scRNA-seq simulators, closely mimicking real data properties. To set the SPARsim parameters, we estimated them from the 10X Genomics example datasets of human Jurkat and 293T cells from Zheng et al The 293T sample of 1,718 cells is included as a built-in SPARsim dataset.…”
Section: Methodsmentioning
confidence: 99%
“…However, a systematic benchmarking study has not been done for sST. Prior studies have established frameworks for comparing single-cell transcriptomic and epigenomic methods, underscoring the necessity for standardized evaluation criteria and reference tissues for technology validation [69], since simulated single-cell and spatial data may not be reliable [10]. While sST technologies share common features, such as the use of spatial DNA barcodes analogous to cell barcodes in scRNA-seq, the methods diverge significantly in aspects like spatial resolution and the preparation of spatially barcoded oligo arrays [11].…”
Section: Mainmentioning
confidence: 99%