Gal Vardi scite author profile

Gal Vardi

5Publications

34Citation Statements Received

142Citation Statements Given

How they've been cited

How they cite others

105

141

Affiliations

Hebrew University of Jerusalem, Weizmann Institute of Science

Publications

Order By: Most citations

Neural Networks with Small Weights and Depth-Separation Barriers

Vardi

Shamir

2020

Preprint

View full text Add to dashboard Cite

In studying the expressiveness of neural networks, an important question is whether there are functions which can only be approximated by sufficiently deep networks, assuming their size is bounded. However, for constant depths, existing results are limited to depths 2 and 3, and achieving results for higher depths has been an important open question. In this paper, we focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth 4, by reduction to open problems and natural-proof barriers in circuit complexity. To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks. We provide a negative and constructive answer to that question, by showing that if a function can be approximated by a polynomially-sized, constant depth k network with arbitrarily large weights, it can also be approximated by a polynomially-sized, depth 3k + 3 network, whose weights are polynomially bounded.

show abstract

Hardness of Learning Neural Networks with Natural Weights

Daniely¹,

Vardi²

2020

Preprint

View full text Add to dashboard Cite

Neural networks are nowadays highly successful despite strong hardness results. The existing hardness results focus on the network architecture, and assume that the network's weights are arbitrary. A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning. This approach is supported by the intuition that the weights in real-world networks are not arbitrary, but exhibit some "random-like" properties with respect to some "natural" distributions.We prove negative results in this regard, and show that for depth-2 networks, and many "natural" weights distributions such as the normal and the uniform distribution, most networks are hard to learn. Namely, there is no efficient learning algorithm that is provably successful for most weights, and every input distribution. It implies that there is no generic property that holds with high probability in such random networks and allows efficient learning.

show abstract

On the Optimal Memorization Power of ReLU Neural Networks

Vardi¹,

Yehudai²,

Shamir³

2021

Preprint

View full text Add to dashboard Cite

We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any N points that satisfy a mild separability assumption using Õ √ N parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( √ N ) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 ≤ L ≤ √ N , for memorizing N samples using Õ(N/L) parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

show abstract

Multi-player flow games

Guha

Kupferman

Vardi

2019

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

Implicit Regularization Towards Rank Minimization in ReLU Networks

Timor¹,

Vardi²,

Shamir³

2022

Preprint

View full text Add to dashboard Cite

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for "most" datasets of size 2). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gal Vardi

Neural Networks with Small Weights and Depth-Separation Barriers

Hardness of Learning Neural Networks with Natural Weights

On the Optimal Memorization Power of ReLU Neural Networks

Multi-player flow games

Implicit Regularization Towards Rank Minimization in ReLU Networks

Contact Info

Product

Resources

About