2021
DOI: 10.1007/s00365-021-09545-2
|View full text |Cite
|
Sign up to set email alerts
|

Best k-Layer Neural Network Approximations

Abstract: We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s 1 , . . . , s n ∈ R p with corresponding responses t 1 , . . . , t n ∈ R q , fitting a k-layer neural network ν θ : R p → R q involves estimation of the weights θ ∈ R m via an ERM:We show that even for k = 2, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. In addition, we deduce that if one attempts to minimize s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…[12,Lemma 27.3]) such that there exists a network configuration achieving zero error and, thus, a global minimum in the search space. For shallow feedforward ANNs using ReLU activation it has been shown that also in the underparametrized regime there exists a global minimum if the ANN has a one-dimensional output [18], whereas there are pathological counterexamples in higher dimensions [19]. However, for general measures µ not necessarily consisting of a finite number of Dirac measures, the literature on the existence of global minima is very limited.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…[12,Lemma 27.3]) such that there exists a network configuration achieving zero error and, thus, a global minimum in the search space. For shallow feedforward ANNs using ReLU activation it has been shown that also in the underparametrized regime there exists a global minimum if the ANN has a one-dimensional output [18], whereas there are pathological counterexamples in higher dimensions [19]. However, for general measures µ not necessarily consisting of a finite number of Dirac measures, the literature on the existence of global minima is very limited.…”
Section: Introductionmentioning
confidence: 99%
“…This phenomenon can also be observed in empirical risk minimization for the hyperbolic tangent activation. As shown in [19], in the underparametrized setting, there exist input data such that for all output data from a set of positive Lebesgue measure there does not exist minimizers in the optimization landscape.…”
Section: Introductionmentioning
confidence: 99%
“…This phenomenon can also be observed in empirical risk minimization for the hyperbolic tangent activation. As shown in [LMQ22], in the underparametrized setting, there exist input data such that for all output data from a set of positive Lebesgue measure there does not exist minimizers in the optimization landscape. It remains an open problem whether for ReLU activation this phenomenon prevails.…”
Section: Introductionmentioning
confidence: 99%