Differentially Private ANOVA Testing

Campbell, Zachary; Bray, Andrew; Ritz, Anna; Groce, Adam

doi:10.1109/icdis.2018.00052

Cited by 22 publications

(31 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The computational experiments allow us to optimize ρ, a parameter that determines the allocation of our privacy budget between the two important intermediate values. We also compare our method to prior work [3], and show an order of magnitude improvement in statistical power.…”

Section: Introductionmentioning

confidence: 87%

“…Note that because neighboring databases are the same size, N can always be released without compromising privacy. 3 Differential privacy has several useful properties. One of the most useful is composition: Theorem 1 (Composition).…”

Section: Differential Privacymentioning

confidence: 99%

“…While it may be tempting to compute the p-value using the reference distribution for the non-private statistic one is estimating, this may yield wildly inaccurate results [3], because adding noise to the statistic increases the probability of outlier output values. Instead, we must compute the reference distribution for the noisy statistic.…”

Section: Differentially Private Hypothesis Testingmentioning

confidence: 99%

“…and Nguyên and Hui propose a test for surival analysis data [13]. There is one prior work on private ANOVA testing, that of Campbell et al [3]. We will discuss this result in greater depth in the next section.…”

Section: Related Workmentioning

confidence: 99%

“…The only previous work on differentially private ANOVA testing that the authors are aware of is Campbell et al [3] Using the ANOVA test as defined above, they analyze the sensitivity of the SSA and SSE with the assumption that all data was normalized to be between 0 and 1 and add Laplacian noise proportional to these sensitivities to the public computation of the SSA and SSE. Their algorithm then uses post-processing to calculate the noisy F -statistic, and returns this in addition to the noisy SSA and SSE (Algorithm 1).…”

Section: Prior Work On Private Anovamentioning

confidence: 99%

See 4 more Smart Citations

Improved Differentially Private Analysis of Variance

Swanberg

Globus-Harris

Griffith

et al. 2019

Proceedings on Privacy Enhancing Technologies

Self Cite

View full text Add to dashboard Cite

Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F -statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F 1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F 1 statistic and give an algorithm that outputs accurate p-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect. * Corresponding authors.Differentially Private ANOVA that this gene must indeed affect the given health outcome. (For more detail on how ANOVA is used in this setting, see [12].)The analysis described above assumes that the researcher has full access to the database. However, there are many settings in medicine, psychology, education, and economics (not to mention private-sector data analysis) where the database is not available to the analyst due to privacy concerns. A well-established solution is to allow the researcher to issue queries to the data which are proven to satisfy differential privacy. Differential privacy requires the addition of random noise to statistical queries and guarantees that the results reveal very little about any individual's data.In this paper we propose a new statistic for ANOVA, called F 1 , that is specifically tailored to the differentially private setting. This statistic measures the same variations as the F statistic, but uses |a − b| instead of (a − b) 2 to measure the distance between a and b. In the public setting the F 1 is a worse test statistic than the traditional F -statistic, but we show that in the private setting it has much higher power than the previously published differentially private F -statistic. That is, we show that it can detect effects with a little as 7% of the data that was previously required. (In one example, an effect that took 5300 data points to detect 90% of the time with = 1 in the prior work takes only 350 data points to detect using our new hypothesis test.) Contributions and organizationWe first review differential privacy, hypothesis testing, and the body of work that lies at the intersection of the two fields (Section 2). In Section 3 we then present a new test statistic, F 1 , for ANOVA in the private setting. While there is some work on differentially private hypothesis testing, designing a new test statistic explicitly tailored for compatibility with differential privacy has been done by few others [14].

show abstract

Section: Introductionmentioning

confidence: 87%

Section: Differential Privacymentioning

confidence: 99%

Section: Differentially Private Hypothesis Testingmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Prior Work On Private Anovamentioning

confidence: 99%

See 3 more Smart Citations

Improved Differentially Private Analysis of Variance

Swanberg

Globus-Harris

Griffith

et al. 2019

Proceedings on Privacy Enhancing Technologies

Self Cite

View full text Add to dashboard Cite

show abstract

DPCL: Contrastive representation learning with differential privacy

Yan

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

With the proliferation of unlabeled data, increasing efforts have been devoted to unsupervised learning. As one of the most representative branches of unsupervised learning, contrastive learning has made great progress with its high efficiency. Unfortunately, privacy threats to contrastive learning have become sophisticated, making it imperative to develop effective technologies that can deal with such threats. To alleviate the privacy issue in contrastive learning, we propose some novel techniques based on differential privacy, which aim at reducing the high sensitivity of gradient in the private training caused by interactive contrastive learning. Specifically, we add differentially private protection to the connection point related to different per-example gradients, which decreases the sensitivity of the gradients significantly. Our experiments on SimCLR and the Barlow Twins show that our approach is superior since it is more accurate while maintaining the same level of privacy protection.

show abstract