2018
DOI: 10.1103/physreve.97.052307
|View full text |Cite
|
Sign up to set email alerts
|

Loss surface of XOR artificial neural networks

Abstract: Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimization tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularizatio… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
21
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(24 citation statements)
references
References 98 publications
2
21
0
1
Order By: Relevance
“…Since the regularisation term is a convex L2 penalty, it is possible that part of the single-funnelled appearance of the reduced-connectivity networks is due purely to regularisation; i.e.higher L2 regularisation convexifies the landscape [15]. Again, for the fully-connected case, we observed a single-funnelled appearance, substantiating our previous suggestion that this type of landscape is architecture dependent.…”
Section: Landscapes With Reduced Connectivitysupporting
confidence: 88%
See 2 more Smart Citations
“…Since the regularisation term is a convex L2 penalty, it is possible that part of the single-funnelled appearance of the reduced-connectivity networks is due purely to regularisation; i.e.higher L2 regularisation convexifies the landscape [15]. Again, for the fully-connected case, we observed a single-funnelled appearance, substantiating our previous suggestion that this type of landscape is architecture dependent.…”
Section: Landscapes With Reduced Connectivitysupporting
confidence: 88%
“…where c(α) is the known outcome for input data item α in the training set. The regularisation term biases against large values for the weights and shifts any zero eigenvalues of the Hessian (second derivative) matrix, which would otherwise complicate transition state searches [15,23]. To accelerate computation of the potential, a GPU version [24] of the loss function and gradient was also implemented and is available in the public domain GMIN and OPTIM programmes [25][26][27].…”
Section: Defining the Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…The XOR problem requires the NN to model the "exclusive-or" logical gate using four binary patterns of two inputs and one output. Despite seeming triviality, the XOR problem is not linearly separable, and thus makes a good case study for fundamental NN properties [26]. The MNIST dataset of handwritten digits [27] contains 70 000 examples of grey scale handwritten digits from 0 to 9, where 60 000 examples constitute the training set, and the remaining 10 000 constitute the test set.…”
Section: A Benchmark Problemsmentioning
confidence: 99%
“…En esta misma línea, las redes neuronales artificiales (RNAs) han sido la herramienta del soft-computing más utilizada para tareas que requieren el reconocimiento de patrones en un conjunto de datos, como imágenes. Una RNA está formada de varios niveles y números de neuronas artificiales, que se constituyen en la unidad de procesamiento, cuyo modelo matemático permite tener varias entradas de datos y una sola saluda que es la ponderación de sus entradas [14], [15]. La conexión de varias neuronas dentro de una RNA constituye una poderosa herramienta de cálculo paralelo, pero capaz de entregar salida aproximativas y no definitivas.…”
Section: Ta Perspectivasunclassified