2019
DOI: 10.1080/10556788.2019.1658107
|View full text |Cite
|
Sign up to set email alerts
|

A robust multi-batch L-BFGS method for machine learning

Abstract: This paper describes an implementation of the L-BFGS method designed to deal with two adversarial situations. The first occurs in distributed computing environments where some of the computational nodes devoted to the evaluation of the function and gradient are unable to return results on time. A similar challenge occurs in a multi-batch approach in which the data points used to compute function and gradients are purposely changed at each iteration to accelerate the learning process. Difficulties arise because… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
99
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(99 citation statements)
references
References 40 publications
0
99
0
Order By: Relevance
“…Also the experience memory D is emptied after each gradient computation, hence the algorithm needs much less RAM memory. Inspired by [50], we use the overlap between the consecutive multi-batch samples O k = J k ∩J k+1 to compute y k as…”
Section: L-bfgs Line-search Deep Q-learning Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Also the experience memory D is emptied after each gradient computation, hence the algorithm needs much less RAM memory. Inspired by [50], we use the overlap between the consecutive multi-batch samples O k = J k ∩J k+1 to compute y k as…”
Section: L-bfgs Line-search Deep Q-learning Methodsmentioning
confidence: 99%
“…The use of overlap to compute y k has been shown to result in more robust convergence in L-BFGS since L-BFGS uses gradient differences to update the Hessian approximations (see [50] and [10]).…”
Section: L-bfgs Line-search Deep Q-learning Methodsmentioning
confidence: 99%
“…L-BFGS on machine learning was proposed [130], which uses the overlapping mini-batches for consecutive samples for quasi-Newton update. It means that the calculation of…”
Section: Svrg [26]mentioning
confidence: 99%
“…• In order to find the difference in gradients, data overlap is used in [23,24]. Our method does not use data overlap (nor is it practically feasible).…”
Section: Introductionmentioning
confidence: 99%
“…• We use stable curvature pair updates as in [24,21]. In fact, whenever a new mini-batch is used, we skip the update of the curvature because the difference in gradients is based on different data.…”
Section: Introductionmentioning
confidence: 99%