2021
DOI: 10.48550/arxiv.2104.08279
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Testing for Outliers with Conformal p-values

Abstract: This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
68
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(68 citation statements)
references
References 68 publications
0
68
0
Order By: Relevance
“…Conformal prediction is a generic approach, and much recent work has focused on designing specific conformal procedures to have good performance according to additional desiderata such as small set sizes [16], coverage that is approximately balanced across regions of feature space [17][18][19][20][21][22][23], and errors balanced across classes [16,[24][25][26]. Recent extensions also address topics such as distribution estimation [27], causal inference [28], survival analysis [29], differential privacy [30], outlier detection [31], speeding up the test-time evaluation of complex models [32,33], the few-shot setting [34], and handling of testing distribution shift [35][36][37][38].…”
Section: Related Workmentioning
confidence: 99%
“…Conformal prediction is a generic approach, and much recent work has focused on designing specific conformal procedures to have good performance according to additional desiderata such as small set sizes [16], coverage that is approximately balanced across regions of feature space [17][18][19][20][21][22][23], and errors balanced across classes [16,[24][25][26]. Recent extensions also address topics such as distribution estimation [27], causal inference [28], survival analysis [29], differential privacy [30], outlier detection [31], speeding up the test-time evaluation of complex models [32,33], the few-shot setting [34], and handling of testing distribution shift [35][36][37][38].…”
Section: Related Workmentioning
confidence: 99%
“…Finally, we mention that the R-value has a nice interpretation under the conformal inference framework. Section A.3 in the Supplementary Material shows that a variation of our R-value corresponds to the Benjamini-Hochberg (BH) adjusted q-value of the conformal p-values (Bates et al, 2021) under the one-class classification setting. The connection to conformal inference and the BH method, both of which are model-free, provides insights on why the FASI algorithm is assumption-lean and offers exact FSR control in finite samples as claimed in Theorem 1.…”
Section: Why Fasi Work?mentioning
confidence: 99%
“…This approach is however supervised, as it requires OOD datapoints to train the detector which may not generalize to unseen OOD datapoints. Recently, there has been interest in unsupervised detection based on ICAD (Cai and Koutsoukos 2020; Bates et al 2021). Cai and Koutsoukos (2020) propose to use Martingale test (Fedorova et al 2012) on p-values from NCM based on either variational autoencoders (VAE) or deep support vector data description (SVDD) for OOD detection in time series data, where a batch of data is available for detection.…”
Section: Introductionmentioning
confidence: 99%
“…Cai and Koutsoukos (2020) propose to use Martingale test (Fedorova et al 2012) on p-values from NCM based on either variational autoencoders (VAE) or deep support vector data description (SVDD) for OOD detection in time series data, where a batch of data is available for detection. Bates et al (2021) focus on problems that arise in conformal detection when multiple points are tested for OOD-ness. iDECODe proposed for the detection of a single point as OOD is also built on ICAD framework, which guarantees a bounded false detection rate (FDR).…”
Section: Introductionmentioning
confidence: 99%