2019
DOI: 10.1101/574574
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model

Abstract: Single cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-P… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
291
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 140 publications
(296 citation statements)
references
References 51 publications
5
291
0
Order By: Relevance
“…If the scale parameter were also held fixed, the QUMI target distribution would be identical for every cell and would predict a constant zero fraction across cells. But this is discordant with the fact that UMI count data exhibit variation in the zero fraction across cells [6]. Since the varying zero fractions in read counts exactly match the zero fractions in underlying UMI counts, it would be inappropriate to alter these correct expression values by normalizing to a global target distribution.…”
Section: Quantile Normalization Of Read Counts To Quasi-umismentioning
confidence: 99%
See 4 more Smart Citations
“…If the scale parameter were also held fixed, the QUMI target distribution would be identical for every cell and would predict a constant zero fraction across cells. But this is discordant with the fact that UMI count data exhibit variation in the zero fraction across cells [6]. Since the varying zero fractions in read counts exactly match the zero fractions in underlying UMI counts, it would be inappropriate to alter these correct expression values by normalizing to a global target distribution.…”
Section: Quantile Normalization Of Read Counts To Quasi-umismentioning
confidence: 99%
“…This is a result of cell-to-cell differences in capture and RT efficiency, which has nothing to do with underlying biology. For UMI counts, systematic variation introduced by these technical components can be addressed by using multinomial models [6]. However, for read counts, such models are precluded by the additional multiplicative distortions of PCR.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations