Power-law scaling to assist with key challenges in artificial intelligence

Meir, Yuval; Sardi, Shira; Hodassman, Shiri; Kisos, Karin; Ben-Noam, Itamar; Goldental, Amir; Kanter, Ido

doi:10.1038/s41598-020-76764-1

Cited by 14 publications

(7 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Models that recognize objects from images require about a 500-fold increase in resources to only double their intelligence [4]. Other studies reached similar conclusions [5,6] (see [7] for a review of how machine learning engineers and data scientists cope in practice with these limitations of deep learning models).…”

Section: Introductionmentioning

confidence: 70%

Guided Transfer Learning

Nikolić¹,

Andrić²,

Nikolić³

2023

Preprint

View full text Add to dashboard Cite

Machine learning requires exuberant amounts of data and computation. Also, models require equally excessive growth in the number of parameters. It is, therefore, sensible to look for technologies that reduce these demands on resources. Here, we propose an approach called guided transfer learning. Each weight and bias in the network has its own guiding parameter that indicates how much this parameter is allowed to change while learning a new task. Guiding parameters are learned during an initial scouting process. Guided transfer learning can result in a reduction in resources needed to train a network. In some applications, guided transfer learning enables the network to learn from a small amount of data. In other cases, a network with a smaller number of parameters can learn a task which otherwise only a larger network could learn. Guided transfer learning potentially has many applications when the amount of data, model size, or the availability of computational resources reach their limits.

show abstract

Section: Introductionmentioning

confidence: 70%

Guided Transfer Learning

Nikolić¹,

Andrić²,

Nikolić³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…3). We note that the presented power law as a function of the depth of the architecture differs from the power law behavior for SRs as a function of the dataset size [29][30][31][32][33] .…”

Section: Discussionmentioning

confidence: 99%

“…Another possible mechanism is the addition of a super-linear number of cross-weights to the filters. This represents a biological realization because cross-weights result as a byproduct of dendritic nonlinear amplification 17,29,34,35 . Nevertheless, these possible enhanced ρ mechanisms significantly increase computational complexity and are mentioned for their potential biological relevance, limited number of layers, and the natural emergence of many cross-weights.…”

Section: Discussionmentioning

confidence: 99%

Efficient shallow learning as an alternative to deep learning

Meir

Tevet

Tzach

et al. 2023

Sci Rep

View full text Add to dashboard Cite

The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it reliably characterizes a class of inputs. Here, we demonstrate that with a fixed ratio between the depths of the first and second convolutional layers, the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. The extrapolation of this power law indicates that the generalized LeNet can achieve small error rates that were previously obtained for the CIFAR-10 database using DL architectures. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. However, this results in a significantly increased number of operations required to achieve a given error rate with respect to LeNet. This power law phenomenon governs various generalized LeNet and VGG-16 architectures, hinting at its universal behavior and suggesting a quantitative hierarchical time–space complexity among machine learning architectures. Additionally, the conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to asymptotically minimize error rates. The efficient shallow learning that is demonstrated in this study calls for further quantitative examination using various databases and architectures and its accelerated implementation using future dedicated hardware developments.

show abstract

“…1. This relation advises on the suitable rescaling of the dataset size (M r −2 ), as the dataset quality is impaired (r → 0), in order to preserve network's abilities; note that power-law scalings were already evidenced in the machine-learning context, see, e.g., [18]. To achieve a quantitative picture and control of the network behavior, we work out a statistical-mechanics investigation and we start by introducing the Boltzmann-Gibbs measure for the system,…”

mentioning

confidence: 99%

Supervised Hebbian learning

Alemanno

Miriam

Kanter

et al. 2023

EPL

Self Cite

View full text Add to dashboard Cite

In neural network's Literature, "Hebbian learning" traditionally refers to the procedure by which the Hopfield model and its generalizations "store" archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred examples of these archetypes), in order to make its own representation of the unavailable archetypes. Here, given a sample of examples, we define a supervised learning protocol based on Hebb's rule and by which the Hopfield network can infer the archetypes. By an analytical inspection, we detect the correct control parameters (including size and quality of the dataset) that tune the system performance and we depict its phase diagram. We also prove that, for structureless datasets, the Hopfield model equipped with this supervised learning rule is equivalent to a restricted Boltzmann machine and this suggests an optimal and interpretable training routine. Finally, this approach is generalized to structured datasets: we highlight a ultrametric-like organization (reminiscent of replica-symmetry-breaking) in the analyzed datasets and, consequently, we introduce an additional "broken-replica hidden layer" for its (partial) disentanglement, which is shown to improve MNIST classification from 75\% to 95\%, and to offer a new perspective on deep architectures.

show abstract

Power-law scaling to assist with key challenges in artificial intelligence

Cited by 14 publications

References 30 publications

Guided Transfer Learning

Guided Transfer Learning

Efficient shallow learning as an alternative to deep learning

Supervised Hebbian learning

Contact Info

Product

Resources

About