We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the Renyi Differential Privacy (RDP) [Mironov, 2017] parameters for algorithms that: (1) subsample the dataset, and then (2) apply a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter.Our results generalize the moments accounting technique, developed by [Abadi et al. 2016] for the Gaussian mechanism, to any subsampled RDP mechanism.
Sparse Subspace Clustering (SSC) and Low-Rank Representation (LRR) are both considered as the state-of-the-art methods for subspace clustering. The two methods are fundamentally similar in that both are convex optimizations exploiting the intuition of "Self-Expressiveness". The main difference is that SSC minimizes the vector 1 norm of the representation matrix to induce sparsity while LRR minimizes nuclear norm (aka trace norm) to promote a low-rank structure. Because the representation matrix is often simultaneously sparse and low-rank, we propose a new algorithm, termed Low-Rank Sparse Subspace Clustering (LRSSC), by combining SSC and LRR, and develops theoretical guarantees of when the algorithm succeeds. The results reveal interesting insights into the strength and weakness of SSC and LRR and demonstrate how LRSSC can take the advantages of both methods in preserving the "Self-Expressiveness Property" and "Graph Connectivity" at the same time.
We consider a refinement of differential privacy -per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. We show that this is a strict generalization of the standard DP and inherits all its desirable properties, e.g., composition, invariance to side information and closure to postprocessing, except that they all hold for every instance separately. When the data is drawn from a distribution, we show that moments of per-instance DP imply generalization. Moreover, we provide explicit calculations of the per-instance DP for the output perturbation on a class of smooth learning problems. The result reveals an interesting and intuitive fact that an individual has stronger privacy if he/she has small "leverage score" with respect to the data set and if he/she can be predicted more accurately using the leave-one-out data set. Simulations show several orders-of-magnitude more favorable privacy and utility trade-off when we consider the privacy of only the users in the data set. In a case study on differentially private linear regression, we provide a novel analysis of the One-Posterior-Sample (OPS) estimator and show that when the data set is well-conditioned it provides ( , δ)-pDP for any target individuals and matches the exact lower bound up to a 1 +Õ(n −1 −2 ) multiplicative factor. We also demonstrate how we can use a "pDP to DP conversion" step to design AdaOPS which uses adaptive regularization to achieve the same results with ( , δ)-DP. * Corresponding
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.