2021
DOI: 10.1016/j.neunet.2021.04.034
|View full text |Cite
|
Sign up to set email alerts
|

Statistical guarantees for regularized neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
14
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 14 publications
1
14
0
Order By: Relevance
“…Note that the above list gives at most R := M + (β(M + 1)) r different parameters. Taking into account (11) we can use P β f instead of P β f to approximate f . Thus, we can replace the entries c x ℓ ,γ /B by the entries cx ℓ ,γ /B = k 2 b , where k is some integer from [−2 b , 2 b ].…”
Section: Proofsmentioning
confidence: 99%
See 2 more Smart Citations
“…Note that the above list gives at most R := M + (β(M + 1)) r different parameters. Taking into account (11) we can use P β f instead of P β f to approximate f . Thus, we can replace the entries c x ℓ ,γ /B by the entries cx ℓ ,γ /B = k 2 b , where k is some integer from [−2 b , 2 b ].…”
Section: Proofsmentioning
confidence: 99%
“…Using this entropy bound, it is then shown in [10], that if the regression function is a composition of Hölder smooth functions, then sparse neural networks with depth log 2 n, width n t 2β+t and the number of non-zero parameters ∼ n t 2β+t log 2 n, where β > 0 and t ≥ 1 depend on the structure and the smoothness of the regression function, attain the minimax optimal prediction error rate n −2β 2β+t (up to a logarithmic factor). Entropy bounds for the spaces of neural networks with certain l 1 -related regularizations are provided in [7] and [11] and their derivation is also based on the sparsity induced by the imposed constraints. In particular, in [7] the above l 0 regularization is replaced by the clipped l 1 norm regularization with sufficiently small clipping threshold.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We also point out the fact that the assumption r ≥ |d (y sgd ,x sgd ) | also makes sense from a statistical perspective; indeed, it is well known that the tuning parameters of regularizers should scale with the differences between predicted and actual outputs. We refer to Huang et al [2021] and Taheri et al [2021] for some insights on this topic.…”
Section: Strictly Positive Limitmentioning
confidence: 99%
“…Although fixing the architecture, the superexpressiveness of activations alone does not tell much about the complexity of approximant networks, which in this case is associated with choices of network weights. Even if we bound the weights required for attaining a given approximation error, the estimation of entropy of networks may still need the activations to be Lipschitz continuous (see, e.g., [4], [5]). Thus, to bound the entropy of approximant networks we may not only bound the weights but also discretize them.…”
mentioning
confidence: 99%