2018
DOI: 10.1101/477794
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Naught all zeros in sequence count data are the same

Abstract: Due to the advent and utility of high-throughput sequencing, modern biomedical research abounds with multivariate count data. Yet such sequence count data is often extremely sparse; that is, much of the data is zero values. Such zero values are well known to cause problems for statistical analyses. In this work we provide a systematic description of different processes that can give rise to zero values as well as the types of methods for addressing zeros in sequence count studies. Importantly, we systematicall… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
83
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 58 publications
(85 citation statements)
references
References 82 publications
2
83
0
Order By: Relevance
“…Excessive zeros in microbiome studies are common and can potentially skew data (41,73). Zeros come in multiple forms (74). Outlier zeros are due to extraneous conditions, structural zeros are due to the nature of different experimental groups, and sampling zeros are any other zero that may be due to low sampling depth.…”
Section: Discussionmentioning
confidence: 99%
“…Excessive zeros in microbiome studies are common and can potentially skew data (41,73). Zeros come in multiple forms (74). Outlier zeros are due to extraneous conditions, structural zeros are due to the nature of different experimental groups, and sampling zeros are any other zero that may be due to low sampling depth.…”
Section: Discussionmentioning
confidence: 99%
“…Relative abundance is given by π ij = x ij /t i where total transcripts t i = j x ij . Since n i t i , there is a "competition to be counted" [33]; genes with large relative abundance π ij in the original cell are more likely to have nonzero UMI counts, but genes with small relative abundances may be observed with UMI counts of exact zeros. The UMI counts y ij are a multinomial sample of the true biological counts x ij , containing only relative information about expression patterns in the cell [34,33].…”
Section: Multinomial Sampling Distribution For Umi Countsmentioning
confidence: 99%
“…For example, whereas under (39) the mean of is (1 − ) , under (40) it is . More generally, these two different interpretations could lead to different inferences about e.g., differential expression or clustering 66 . We argue that both theory and empirical evidence support the use of the Poisson measurement model, and not the ZIP measurement model, and that therefore analyses using the ZINB observation model should be derived and interpreted using (39) rather than (40).…”
Section: Appendix D Identifiability Of Measurement and Expression Momentioning
confidence: 99%