Quickly Finding a Benign Region via Heavy Ball Momentum in Non-Convex Optimization

Wang, Jun-Kun; Abernethy, Jacob

doi:10.48550/arxiv.2010.01449

Cited by 1 publication

(4 citation statements)

References 53 publications

(68 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can now move towards lines (49)(50)(51)(52) and work on simplifying these equa-tions. We start with line (49):…”

Section: Dmft For Nesterov Accelerationmentioning

confidence: 99%

“…Where line (56) follows by the definition of response function in y. Moving towards lines (50,51), and carefully taking into account the permutations, we obtain…”

Section: Dmft For Nesterov Accelerationmentioning

confidence: 99%

“…Which algorithm is the best in practice seems not to have a simple answer and there are instances where a class of algorithms outperforms the other and vice-versa [26]. Most of the theoretical literature on momentum-based methods concerns convex problems [18,23,24,30,49] and, despite these methods have been successfully applied to a variety of problems, only recently high dimensional non-convex settings have been considered [22,51,52]. Furthermore, with few exceptions [45], the majority of these studies focus on worst-case analysis while empirically one could also be interested in the behaviour of such algorithms on typical instances of the optimization problem, when this is extracted from a probability distribution.…”

Section: Introductionmentioning

confidence: 99%

“…We apply our equations to the spiked matrix-tensor model which displays a similar phenomenology as the one described in [51] for the phase retrieval problem: all algorithms have two dynamical regimes. First, they navigate in the non-convex landscape and, second, if the signal to noise ratio is strong enough, the dynamics eventually enters in the basin of attraction of the signal and rapidly reaches the bottom of the cost function.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

Mannelli¹,

Urbani²

2021

Preprint

View full text Add to dashboard Cite

When optimizing over loss functions it is common practice to use momentum-based accelerated methods rather than vanilla gradient-based method. Despite widely applied to arbitrary loss function, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work we used dynamical mean field theory techniques to describe analytically the average behaviour of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviours of several algorithms including heavy-ball momentum and Nesterov acceleration. Additionally we characterize the evolution of a mathematically equivalent physical system of massive particles relaxing toward the bottom of an energetic landscape. Under the correct mapping the two dynamics are equivalent and it can be noticed that having a large mass increases the effective time step of the heavy ball dynamics leading to a speed up.

show abstract

“…We can now move towards lines (49)(50)(51)(52) and work on simplifying these equa-tions. We start with line (49):…”

Section: Dmft For Nesterov Accelerationmentioning

confidence: 99%

“…Where line (56) follows by the definition of response function in y. Moving towards lines (50,51), and carefully taking into account the permutations, we obtain…”

Section: Dmft For Nesterov Accelerationmentioning

confidence: 99%