Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Zhou, Yingxue; Chen, Xiangyi; Mei, Hong; Wu, Zhiwei Steven; Banerjee, Arindam

doi:10.48550/arxiv.2006.13501

Cited by 2 publications

(5 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…suggests that faster convergence of non-private training could translate to better private learning. DP-SGD with adaptive updates (e.g.,Adam (Kingma & Ba, 2015)) indeed sometimes leads to small improvements(Papernot et al, 2020b;Chen & Lee, 2020;Zhou et al, 2020a). Investigating private variants of second-order optimization methods is an interesting direction for future work.• More training steps (a.k.a more data): For a fixed DP-budget ε and noise scale σ, increasing the training set size N allows for running more steps of DP-SGD(McMahan et al, 2018).…”

mentioning

confidence: 99%

Differentially Private Learning Needs Better Features (or Much More Data)

Tramèr,

Boneh

2020

Preprint

View full text Add to dashboard Cite

We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets. To exceed the performance of handcrafted features, we show that private learning requires either much more private data, or access to features learned on public data from a similar domain. Our work introduces simple yet strong baselines for differentially private learning that can inform the evaluation of future progress in this area.

show abstract

mentioning

confidence: 99%

Differentially Private Learning Needs Better Features (or Much More Data)

Tramèr,

Boneh

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The bounded gradient assumption is a common assumption for the analysis of DP-SGD algorithms , Zhou et al, 2020a and also frequently used in general adaptive gradient methods such as Adam [Reddi et al, 2021, Chen et al, 2018, Reddi et al, 2018. One recent popular approach to relax this assumption is using the gradient clipping method [Chen et al, 2020, Andrew et al, 2019, Pichapati et al, 2019, which we will discuss more in Section 6 as well as in Appendix A.…”

Section: Preliminariesmentioning

confidence: 99%

“…While most of them focus on convex functions, we study DP-ERM with nonconvex loss functions. As most existing algorithms achieving differential privacy in ERM are based on the gradient perturbation [Bassily et al, 2014, Zhou et al, 2020a, we thus study gradient perturbation.…”

Section: Related Workmentioning

confidence: 99%

“…where the loss function f (•) : R d × X → R is non-convex and smooth at each data point. To measure the performance of gradient-based algorithms for ERM, which enjoys privacy guarantees, we define the utility by using the expected 2 -norm of gradient, i.e., E[ ∇F (θ) ], where the expectation is taken over the randomness of the algorithm , Zhang et al, 2017, Zhou et al, 2020a]. 1 The DP-SGD (ADP-SGD), represented by the solid line.…”

Section: Introductionmentioning

confidence: 99%

“…Recent popular techniques for tuning η t include adaptive gradient methods Duchi et al [2011] and decaying stepsize schedules Goyal et al [2017]. When applying non-constant stepsizes, most of the existing differentially private algorithms directly follow the standard DP-SGD strategy by adding a simple perturbation (i.e, Z ∼ N (0, σ 2 I)) to each gradient over the entire sequence of iterations [Zhou et al, 2020a]. This results in a uniformly-distributed privacy budget for each iteration [Bassily et al, 2014].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Adaptive Differentially Private Empirical Risk Minimization

Wu¹,

Wang²,

Cristali³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. At each iteration, the random noise added to the gradient is optimally adapted to the stepsize; we name this process adaptive differentially private (ADP) learning. Given the same privacy budget, we prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added. Our method is particularly useful for gradient-based algorithms with time-varying learning rates, including variants of AdaGrad (Duchi et al., 2011). We provide extensive numerical experiments to demonstrate the effectiveness of the proposed adaptive differentially private algorithm.

show abstract

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Cited by 2 publications

References 25 publications

Differentially Private Learning Needs Better Features (or Much More Data)

Differentially Private Learning Needs Better Features (or Much More Data)

Adaptive Differentially Private Empirical Risk Minimization

Contact Info

Product

Resources

About