2020
DOI: 10.48550/arxiv.2006.13501
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Abstract: We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a p-dimensional space given n i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of √ p/ √ n from prior work and obtain a sharper rate of 4 √ p/ √ n. We obtain this rate by providing the first analyses on a collection of private gradient-based methods, including adaptive algorithms DP RMSProp and DP Adam. Our proof techniq… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…suggests that faster convergence of non-private training could translate to better private learning. DP-SGD with adaptive updates (e.g.,Adam (Kingma & Ba, 2015)) indeed sometimes leads to small improvements(Papernot et al, 2020b;Chen & Lee, 2020;Zhou et al, 2020a). Investigating private variants of second-order optimization methods is an interesting direction for future work.• More training steps (a.k.a more data): For a fixed DP-budget ε and noise scale σ, increasing the training set size N allows for running more steps of DP-SGD(McMahan et al, 2018).…”
mentioning
confidence: 99%
“…suggests that faster convergence of non-private training could translate to better private learning. DP-SGD with adaptive updates (e.g.,Adam (Kingma & Ba, 2015)) indeed sometimes leads to small improvements(Papernot et al, 2020b;Chen & Lee, 2020;Zhou et al, 2020a). Investigating private variants of second-order optimization methods is an interesting direction for future work.• More training steps (a.k.a more data): For a fixed DP-budget ε and noise scale σ, increasing the training set size N allows for running more steps of DP-SGD(McMahan et al, 2018).…”
mentioning
confidence: 99%
“…The bounded gradient assumption is a common assumption for the analysis of DP-SGD algorithms , Zhou et al, 2020a and also frequently used in general adaptive gradient methods such as Adam [Reddi et al, 2021, Chen et al, 2018, Reddi et al, 2018. One recent popular approach to relax this assumption is using the gradient clipping method [Chen et al, 2020, Andrew et al, 2019, Pichapati et al, 2019, which we will discuss more in Section 6 as well as in Appendix A.…”
Section: Preliminariesmentioning
confidence: 99%
“…While most of them focus on convex functions, we study DP-ERM with nonconvex loss functions. As most existing algorithms achieving differential privacy in ERM are based on the gradient perturbation [Bassily et al, 2014, Zhou et al, 2020a, we thus study gradient perturbation.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations