2020
DOI: 10.48550/arxiv.2008.01772
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 0 publications
1
7
0
Order By: Relevance
“…Independent of and concurrent with previous versions (Sahs et al, 2020a , b ) of this work, Williams et al ( 2019 ) has implicit regularization results in the kernel and adaptive regimes which parallel our results in this section rather closely. Despite the similarities, we take a significantly different approach.…”
Section: Theoretical Resultssupporting
confidence: 87%
“…Independent of and concurrent with previous versions (Sahs et al, 2020a , b ) of this work, Williams et al ( 2019 ) has implicit regularization results in the kernel and adaptive regimes which parallel our results in this section rather closely. Despite the similarities, we take a significantly different approach.…”
Section: Theoretical Resultssupporting
confidence: 87%
“…While in practice it is more common that the weights in the hidden layer have a smaller variance (e.g. Glorot and Bengio [2010], He et al [2015]), our scaling prevents the breakpoints of the neurons in the trained network to move too much before we are able to decrease the training error sufficiently, and the impact of the magnitude by which we scale the hidden layer upon initialization on the dynamics of GF was studied in a similar univariate regression setting [Williams et al, 2019, Sahs et al, 2020. We now conclude the discussion of our main result with the following remarks on the setting studied in our paper.…”
Section: Resultsmentioning
confidence: 99%
“…It is interesting that even when allowing for more input arguments, the resultant learned nonlinearities favor low-order quadratic functions (Figure 8b-d). This could be explained by an implicit bias toward smooth functions [45,46] while still bending the input space to provide useful computations. Perhaps the learned nonlinearities are as random as possible while fulfilling these minimal conditions.…”
Section: Discussionmentioning
confidence: 99%