A robust multi-batch L-BFGS method for machine learning

Berahas, Albert S.; Takáč, Martin

doi:10.1080/10556788.2019.1658107

Cited by 63 publications

(99 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also the experience memory D is emptied after each gradient computation, hence the algorithm needs much less RAM memory. Inspired by [50], we use the overlap between the consecutive multi-batch samples O k = J k ∩J k+1 to compute y k as…”

Section: L-bfgs Line-search Deep Q-learning Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Quasi-Newton Optimization Methods for Deep Learning Applications

Rafati

Marica

2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives to SGD. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. In this chapter, we propose efficient optimization methods based on L-BFGS quasi-Newton methods using line search and trust-region strategies. Our methods bridge the disparity between first-and second-order methods by using gradient information to calculate low-rank updates to Hessian approximations. We provide formal convergence analysis of these methods as well as empirical results on deep learning applications, such as image classification tasks and deep reinforcement learning on a set of ATARI 2600 video games. Our results show a robust convergence with preferred generalization characteristics as well as fast training time.

show abstract

Section: L-bfgs Line-search Deep Q-learning Methodsmentioning

confidence: 99%

“…The use of overlap to compute y k has been shown to result in more robust convergence in L-BFGS since L-BFGS uses gradient differences to update the Hessian approximations (see [50] and [10]).…”

Section: L-bfgs Line-search Deep Q-learning Methodsmentioning

confidence: 99%

Quasi-Newton Optimization Methods for Deep Learning Applications

Rafati

Marica

2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

show abstract

“…L-BFGS on machine learning was proposed [130], which uses the overlapping mini-batches for consecutive samples for quasi-Newton update. It means that the calculation of…”

Section: Svrg [26]mentioning

confidence: 99%

A Survey of Optimization Methods From a Machine Learning Perspective

Sun

Cao

Han

et al. 2020

IEEE Trans. Cybern.

551

228

View full text Add to dashboard Cite

Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.

show abstract

“…• In order to find the difference in gradients, data overlap is used in [23,24]. Our method does not use data overlap (nor is it practically feasible).…”

Section: Introductionmentioning

confidence: 99%

“…• We use stable curvature pair updates as in [24,21]. In fact, whenever a new mini-batch is used, we skip the update of the curvature because the difference in gradients is based on different data.…”

Section: Introductionmentioning

confidence: 99%

A Stochastic LBFGS Algorithm for Radio Interferometric Calibration

Yatawatta

Clercq

Spreeuw

et al. 2019

2019 IEEE Data Science Workshop (DSW)

View full text Add to dashboard Cite

We present a stochastic, limited-memory Broyden Fletcher Goldfarb Shanno (LBFGS) algorithm that is suitable for handling very large amounts of data. A direct application of this algorithm is radio interferometric calibration of raw data at fine time and frequency resolution. Almost all existing radio interferometric calibration algorithms assume that it is possible to fit the dataset being calibrated into memory. Therefore, the raw data is averaged in time and frequency to reduce its size by many orders of magnitude before calibration is performed. However, this averaging is detrimental for the detection of some signals of interest that have narrow bandwidth and time duration such as fast radio bursts (FRBs). Using the proposed algorithm, it is possible to calibrate data at such a fine resolution that they cannot be entirely loaded into memory, thus preserving such signals. As an additional demonstration, we use the proposed algorithm for training deep neural networks and compare the performance against the mainstream first order optimization algorithms that are used in deep learning.

show abstract

A robust multi-batch L-BFGS method for machine learning

Cited by 63 publications

References 40 publications

Quasi-Newton Optimization Methods for Deep Learning Applications

Quasi-Newton Optimization Methods for Deep Learning Applications

A Survey of Optimization Methods From a Machine Learning Perspective

A Stochastic LBFGS Algorithm for Radio Interferometric Calibration

Contact Info

Product

Resources

About