2022
DOI: 10.3390/math10193556
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Extreme Search of Multidimensional Functions Based on Natural Gradient Descent with Dirichlet Distributions

Abstract: The high accuracy attainment, using less complex architectures of neural networks, remains one of the most important problems in machine learning. In many studies, increasing the quality of recognition and prediction is obtained by extending neural networks with usual or special neurons, which significantly increases the time of training. However, engaging an optimization algorithm, which gives us a value of the loss function in the neighborhood of global minimum, can reduce the number of layers and epochs. In… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…Such networks are part of developing quantum information theory and quantum computers, that makes the process of training significantly faster. But in actual researches, vanilla natural gradient descent with Dirichlet distributions is already tested on Rastrigin and Rosebrock functions in [108] and exploited in convolutional, recurrent neural networks in [109]. The important thing in this method is selecting appropriate probability distribution, which can increase the rate of convergence and even accelerate learning process.…”
Section: Natural Gradient Descentmentioning
confidence: 99%
See 2 more Smart Citations
“…Such networks are part of developing quantum information theory and quantum computers, that makes the process of training significantly faster. But in actual researches, vanilla natural gradient descent with Dirichlet distributions is already tested on Rastrigin and Rosebrock functions in [108] and exploited in convolutional, recurrent neural networks in [109]. The important thing in this method is selecting appropriate probability distribution, which can increase the rate of convergence and even accelerate learning process.…”
Section: Natural Gradient Descentmentioning
confidence: 99%
“…But selecting appropriate probability distribution, like Gauss and Dirichelt, we can reduce the variable θ in Fisher infromation matrix, what makes possible to avoid its calculation in every iteration. Such approach is realized in [108] - [111]. The natural gradient descent, based on Fisher-Rao metrics, can replace second order optimization algorithms, due to their rate of convergence and time consumption.…”
Section: Probability Density Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…As a training algorithm, we use an optimizer, known as SGD [10]. Ideally, this can be regarded as a potential dynamics for each parameter, w,…”
Section: Modelmentioning
confidence: 99%
“…We can update the network parameters along the gradient of the loss step by step. The training dynamics are generated through the update steps and are often stochastic depending on the update procedures [10]. Regardless of the stochastic nature, we can derive a deterministic description of the training response at some ensemble levels [11,12,15].…”
Section: Introductionmentioning
confidence: 99%