2021
DOI: 10.48550/arxiv.2106.07724
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

Shashank Rajput,
Kartik Sreenivasan,
Dimitris Papailiopoulos
et al.

Abstract: It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that deep threshold networks can memorize n points inweights, where δ is the minimum distance between the points. In this work, we improve the dependence on δ from exponential to almost linear, proving that O( 1 δ + √ n) neurons and O( d δ + n) weights are sufficient. Our construction uses Gaussi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…[37] showed that threshold and ReLU networks can memorize N binary-labeled unit vectors in R d separated by a distance of δ > 0, using Õ e 1/δ 2 + √ N neurons and Õ e 1/δ 2 (d + √ N ) + N parameters. [27] improved the dependence on δ by giving a construction with Õ 1 δ + √ N neurons and Õ d δ + N parameters. This result holds only for threshold networks, but does not assume that the inputs are on the unit sphere.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…[37] showed that threshold and ReLU networks can memorize N binary-labeled unit vectors in R d separated by a distance of δ > 0, using Õ e 1/δ 2 + √ N neurons and Õ e 1/δ 2 (d + √ N ) + N parameters. [27] improved the dependence on δ by giving a construction with Õ 1 δ + √ N neurons and Õ d δ + N parameters. This result holds only for threshold networks, but does not assume that the inputs are on the unit sphere.…”
Section: Related Workmentioning
confidence: 99%
“…Many works have shown results regarding the memorization power of neural networks, using different assumptions on the activation function and data samples (see e.g. [19,18,4,37,13,14,8,26,17,39,40,25,27,32]). The question of memorization also have practical implications on phenomenons such as "double descent" [5,24] which connects the memorization power of neural networks with their generalization capabilities.…”
Section: Introductionmentioning
confidence: 99%