A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

Jentzen, Arnulf; Riekert, Adrian

doi:10.1007/s00033-022-01716-w

Cited by 7 publications

(5 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The previous paper [24] contains comparable results for 1d shallow networks. Similar approximation results for gradient flow trained shallow 1d networks are in [33,31], with slightly different assumptions on the target f , more general probability weighted L 2 loss and an alternative proof technique. Other approximation and optimization guarantees rely on alternative optimizers.…”

Section: Literature Reviewmentioning

confidence: 55%

See 1 more Smart Citation

Universality of gradient descent neural network training

Welper

2022

Neural Networks

View full text Add to dashboard Cite

Section: Literature Reviewmentioning

confidence: 55%

“…Due to the over-parametrized regime, these optimization results achieve zero training error in discrete sample norms and are therefore not immediately compatible with the approximation literature. There are relatively few papers [1,21,42,15,24,26,30,23,45] that consider approximation and optimization simultaneously.…”

Section: Introductionmentioning

confidence: 99%

Universality of gradient descent neural network training

Welper

2022

Neural Networks

View full text Add to dashboard Cite

“…The previous paper [24] contains comparable results for 1d shallow networks. Similar approximation results for gradient flow trained shallow 1d networks are in [30,32], with slightly different assumptions on the target f , more general probability weighted L 2 loss and an alternative proof technique. Other approximation and optimization guarantees rely on alternative optimizers.…”

Section: Literature Reviewmentioning

confidence: 62%

“…• Gradient descent or gradient flow error bounds in continuous L 2 norms can be found in [30,32], and [17,37] The first set of papers uses more general L 2 (P) losses, weighted by a probability measure P of the training samples. For deep networks, they show that the loss converges to zero if the learning target f is piecewise polynomial and for shallow networks if the target is a increasing function.…”

Section: Literature Reviewmentioning

confidence: 99%

Approximation Results for Gradient Flow Trained Neural Networks

Gerrit Welper

2024

JML

View full text Add to dashboard Cite

The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous L 2 (S d−1 )-norm on the d-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. We show gradient flow convergence based on a neural tangent kernel (NTK) argument for the non-convex optimization of the second but last layer. Unlike standard NTK analysis, the continuous error norm implies an underparametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.

show abstract

“…The way the validation set is divided may result in a large variance in the validation scores. In this case, the best practice is to use the K fold cross validation method [13] . This method divides the available data into K partitions (K is usually 4 or 5) and instantiates K identical models.…”

Section: Model Verificationmentioning

confidence: 99%

College student activity attendance management system design based on Internet of Things and deep learning technology

wang,

Pang

2024

Third International Conference on Intelligent Traffic Systems and Smart City (ITSSC 2023)

View full text Add to dashboard Cite

The admission qualification and attendance management of campus activities is a difficult problem in college student management. In order to solve this problem, this paper proposes a rapid deployment method of existing equipment and new equipment based on the Internet of Things technology. The collected data is transmitted to the resource server and data server through the Internet of Things and is combined with the "dynamic two-dimensional code" to complete the admission qualification and attendance management. As a result, it solves the problems of difficult deployment and poor scalability of traditional equipment. Therefore, compared with the traditional equipment recording or paper recording method, this method has the advantages of transformation, easy deployment and convenient maintenance. At the same time, real-time logs can be generated in the system after checking in the above way. Using the technology of "deep learning" to analyse these logs, we can accurately analyse and judge the rules of college students' daily activity attendance.

show abstract

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

Cited by 7 publications

References 32 publications

Universality of gradient descent neural network training

Universality of gradient descent neural network training

Approximation Results for Gradient Flow Trained Neural Networks

College student activity attendance management system design based on Internet of Things and deep learning technology

Contact Info

Product

Resources

About