2021
DOI: 10.48550/arxiv.2103.01294
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Wide Network Learning with Differential Privacy

Abstract: Despite intense interest and considerable effort, the current generation of neural networks suffers a significant loss of accuracy under most practically relevant privacy training regimes. One particularly challenging class of neural networks are the wide ones, such as those deployed for NLP typeahead prediction or recommender systems.Observing that these models share something in common-an embedding layer that reduces the dimensionality of the input-we focus on developing a general approach towards training t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…On the other hand, as discussed in Section 3, hybrid clipping with 𝑙 ∞ -norm constraint is only determined by a few stable aggregate statistics, such as the principal components and the average of the power in each of them, which capture the populational statistics of underlying processed output distribution. This is a much more smooth operation compared to many other existing clipping methods, such as sparsification [34,51,53], where only significant coordinates are preserved or participate in the processing while the remaining are either frozen or removed. Though these artificial dimension-reduction techniques can also decrease the noise scale, the advantage can be easily offset by the large clipping bias produced and may not outperform simple 𝑙 2 -norm clipping, especially in deep learning [11].…”
Section: A Additional Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…On the other hand, as discussed in Section 3, hybrid clipping with 𝑙 ∞ -norm constraint is only determined by a few stable aggregate statistics, such as the principal components and the average of the power in each of them, which capture the populational statistics of underlying processed output distribution. This is a much more smooth operation compared to many other existing clipping methods, such as sparsification [34,51,53], where only significant coordinates are preserved or participate in the processing while the remaining are either frozen or removed. Though these artificial dimension-reduction techniques can also decrease the noise scale, the advantage can be easily offset by the large clipping bias produced and may not outperform simple 𝑙 2 -norm clipping, especially in deep learning [11].…”
Section: A Additional Discussionmentioning
confidence: 99%
“…where 𝑒 (𝑡 ) is some Gaussian noise. Running for 𝑇 iterations with a total privacy budget (𝜖, 𝛿), one may select 𝑒 (𝑡 ) ∼ N (0, 𝜎 Another critical motivation behind these experiments is to evaluate the performance of classic dimension-reduction clipping methods, such as sparsification [34], [51], [53] (preserving only significant coordinates) or low-rank embedding [50] (projection to a subspace). From a theoretical perspective, these strategies can artificially alleviate the curse of dimensionality, as the scale of noise is now determined by the Hamming weight after sparsification or the rank of embedding.…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…However the authors point out that high degree of noise motivated by ensuring high-level privacy directly impacts the relative ranking of models' performances. Most recent works are extending differential privacy methods to complex deep recommender systems such as wide and deep architectures [30] and collaborative bandits learning [26].…”
Section: Privacymentioning
confidence: 99%
“…Application. A common use case for SVT is the differentially private release of only those entries of a vector with large magnitude, instead of the entire vector [27,39]. This can be desirable for multiple reasons: to be able to release the entries with less noise, since the privacy budget needs to be divided between fewer entries; to release only those values of a histogram that are large enough in magnitude so that they will not be dominated by the added noise; or to reduce communication costs in a distributed setting.…”
mentioning
confidence: 99%