2023
DOI: 10.1101/2023.03.16.533008
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

OASIS: An interpretable, finite-sample valid alternative to Pearson’sX2for scientific discovery

Abstract: Contingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-genome-free inference (Chaung et al., 2022), we develop OASIS (Optimized Adaptive Statistic for Inferring Structure), a family of statistical tests for con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 40 publications
0
3
0
Order By: Relevance
“…Classical and parametric tests struggle to prioritize biologically important variation because they are overpowered in the context of noise generated from biochemical sampling, and may report inaccurate p-values. SPLASH’s test provides finite-sample valid p-value bounds, unlike Pearson’s chi-squared test, better controls false positive calls under commonly used models such as negative binomial for scRNA-seq (Buen Abad Najar, Yosef, and Lareau 2020; Baharav, Tse, and Salzman 2023), and performs inference independent of any cell metadata such as cell type.…”
Section: Introductionmentioning
confidence: 99%
“…Classical and parametric tests struggle to prioritize biologically important variation because they are overpowered in the context of noise generated from biochemical sampling, and may report inaccurate p-values. SPLASH’s test provides finite-sample valid p-value bounds, unlike Pearson’s chi-squared test, better controls false positive calls under commonly used models such as negative binomial for scRNA-seq (Buen Abad Najar, Yosef, and Lareau 2020; Baharav, Tse, and Salzman 2023), and performs inference independent of any cell metadata such as cell type.…”
Section: Introductionmentioning
confidence: 99%
“…Additional outputs such as an asymptotically-valid p-value are also available. Additional statistical details regarding SPLASH are available in (Baharav, Tse, and Salzman 2024). For each anchor, SPLASH computes an "effect size", which falls within the range of 0 to 1.…”
Section: Splash's Statistical Methodologymentioning
confidence: 99%
“…Step 2 involves merging the target sequences and their counts found for each anchor across all samples to build a contingency table of target counts for each unique anchor. The contingency table is then used to compute a statistically valid p-value through an unsupervised optimization (Baharav, Tse, and Salzman 2024;Chaung et al 2023) (Methods). This step is memory-frugal as the contingency table is represented by a sparse matrix and each contingency table is loaded into memory individually.…”
mentioning
confidence: 99%