2023
DOI: 10.48550/arxiv.2302.14690
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

Abstract: Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function. In practical relevant learning problems, it thus seems to be advisable to design the ANN architectures in a way so that GD optimization processes remain bounded. The property of the boundedness of GD processes fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 11 publications
0
10
0
Order By: Relevance
“…We distinguish cases. If one has m 0 = 1 (case (a)) or a ≡ b ∈ R (case (c)), then the proof of Proposition 3.3 in [DJK23] shows that an appropriate replacement of a summand of multiplicity two by two summands of multiplicity one reduces the error which shows (9).…”
Section: Strict Generalized Responses Are Not Better Than Representab...mentioning
confidence: 97%
See 4 more Smart Citations
“…We distinguish cases. If one has m 0 = 1 (case (a)) or a ≡ b ∈ R (case (c)), then the proof of Proposition 3.3 in [DJK23] shows that an appropriate replacement of a summand of multiplicity two by two summands of multiplicity one reduces the error which shows (9).…”
Section: Strict Generalized Responses Are Not Better Than Representab...mentioning
confidence: 97%
“…We will work with more intuitive geometric descriptions of realization functions of networks W ∈ W d as introduced in [DJK23]. We call a network W ∈ W d non-degenerate iff for all j = 1, .…”
Section: Generalized Response Of Neural Networkmentioning
confidence: 99%
See 3 more Smart Citations