2022
DOI: 10.48550/arxiv.2202.06236
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Natural Gradient Descent Methods for Large-Scale Optimization Problems

Abstract: We propose an efficient numerical method for computing natural gradient descent directions with respect to a generic metric in the state space. Our technique relies on representing the natural gradient direction as a solution to a standard least-squares problem. Hence, instead of calculating, storing, or inverting the information matrix directly, we apply efficient methods from numerical linear algebra to solve this least-squares problem. We treat both scenarios where the derivative of the state variable with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…The adjoint-state method is an efficient technique by which we can evaluate the derivative ∂ θ J , as the computation time is largely independent of the size of θ. One can derive the adjoint-state method for gradient computations by differentiating the discrete constraint [52], which in our case is the eigenvector problem…”
Section: Gradient Calculation Through the Adjoint-state Methodsmentioning
confidence: 99%
“…The adjoint-state method is an efficient technique by which we can evaluate the derivative ∂ θ J , as the computation time is largely independent of the size of θ. One can derive the adjoint-state method for gradient computations by differentiating the discrete constraint [52], which in our case is the eigenvector problem…”
Section: Gradient Calculation Through the Adjoint-state Methodsmentioning
confidence: 99%
“…It could also lead to different gradient flow formulations when passing the Fréchet derivative to the gradient, thus giving rise to different gradient descent algorithms for solving such nonconvex optimization problems. Both choices affect the convergence rate and potentially change the stationary points to which the iterative gradient-based algorithm converges, even with the same initial guess [34]. We will demonstrate this later in section 6.…”
Section: Introductionmentioning
confidence: 98%