1993
DOI: 10.1209/0295-5075/21/8/013
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Learning with a Neural Network

Abstract: We introduce optimal learning with a neural network, which we define as minimising the expectation generalisation error. We find that the optimally-trained spherical perceptron may learn a linearly-separable rule as well as any possible network. We sketch an algorithm to generate optimal learning, and simulation results support our conclusions. Optimal learning of a well-known, significant unlearnable problem, the < Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0

Year Published

1993
1993
2012
2012

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 47 publications
(29 citation statements)
references
References 7 publications
1
28
0
Order By: Relevance
“…With this cost function, the optimal generalizer may be found by a simple gradient descent, with neither the need to train an infinite number of perceptrons for implementing a commitee machine, as was suggested by Opper and Haussler [8], nor to determine a large number of 'samplers' of the version space, as proposed by Watkin [10]. Once the potential is known, it is straightforward to calculate the distribution of stabilities of the training set:…”
Section: Theoretical Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…With this cost function, the optimal generalizer may be found by a simple gradient descent, with neither the need to train an infinite number of perceptrons for implementing a commitee machine, as was suggested by Opper and Haussler [8], nor to determine a large number of 'samplers' of the version space, as proposed by Watkin [10]. Once the potential is known, it is straightforward to calculate the distribution of stabilities of the training set:…”
Section: Theoretical Resultsmentioning
confidence: 99%
“…The fact that the bayesian student has patterns at vanishing distance from the hyperplane, and has most patterns at distances larger than κ, allows us to conclude that its weight vector lies close to the boundary of the version space. It has been shown [10] that the bayesian weight vector is the barycenter of the (strictly convex) version space. Our result means that the barycenter of the version space is far from its center, which is rather surprising, and might indicate that the version space is highly non-spherical.…”
Section: Theoretical Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…ANN application areas includes data classification and pattern recognition (Ripley, 1996), damage detection and earthquake simulation (Pei et al, 2006), function approximation (Toh, 1999;Ye and Lin, 2003), material science (Bhadeshia, 1999), experimental design of engineering systems (Röpke et al, 2005), nonlinear optimization (Malek et al, 2010), polypeptide structure prediction (Dorn and de Souza, 2010), prediction of trading signals of stock market indices (Tilakaratne et al, 2008), regression analysis (De Veux et al, 1998), signal and image processing (Watkin, 1993;Masters, 1994), time series analysis and forecasting (Franses and van Dijk, 2000;Kajitani et al, 2005).…”
Section: Discussionmentioning
confidence: 99%
“…ANN model parameterization frameworks and numerical studies are presented and discussed e.g. by Watkin (1993), Prechelt (1994), Bianchini and Gori (1996), Sexton et al (1998), Jordanov and Brown (1999), Toh (1999), Ye and Lin (2003), Abraham (2004), Hamm et al (2007).…”
Section: Postulating and Calibrating A Model Instancesmentioning
confidence: 99%