2019
DOI: 10.1101/758524
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data-based RNA-seq Simulations by Binomial Thinning

Abstract: With the explosion in the number of methods designed to analyze bulk and single-cell RNAseq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance. Rather than generate data from a theoretical model, in this paper we develop methods to add signal to… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 106 publications
(114 reference statements)
0
1
0
Order By: Relevance
“…In real data sets, the true value of is unknown. To side-step this problem, we used binomial thinning 49,50 of real data sets (independently proposed as molecular cross-validation 51 ) to produce training and validation data with the same true expression matrix . The key idea of this approach is that a Binomial sample of a Poisson-distributed count is also (marginally) Poisson-distributed, meaning it can produce such training/validation data without requiring knowledge of the true .…”
Section: Multi-gene Modelsmentioning
confidence: 99%
“…In real data sets, the true value of is unknown. To side-step this problem, we used binomial thinning 49,50 of real data sets (independently proposed as molecular cross-validation 51 ) to produce training and validation data with the same true expression matrix . The key idea of this approach is that a Binomial sample of a Poisson-distributed count is also (marginally) Poisson-distributed, meaning it can produce such training/validation data without requiring knowledge of the true .…”
Section: Multi-gene Modelsmentioning
confidence: 99%