2020
DOI: 10.1093/imanum/drz076
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization

Abstract: We consider the adaptive regularization with cubics approach for solving nonconvex optimization problems and propose a new variant based on inexact Hessian information chosen dynamically. The theoretical analysis of the proposed procedure is given. The key property of ARC framework, constituted by optimal worst-case function/derivative evaluation bounds for first- and second-order critical point, is guaranteed. Application to large-scale finite-sum minimization based on subsampled Hessian is discussed and anal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
44
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(45 citation statements)
references
References 42 publications
(62 reference statements)
0
44
0
1
Order By: Relevance
“…Results are reported in Table 4. 4. We can see that, reducing N 0 , the number of full function/gradient evaluations reduces as well, and that for N 0 = 0.01N the average classification error compares well with the error when N 0 = 0.1N ; for instance, the best results for cina0 and covertype are obtained by shrinking N 0 to 1% of the maximum sample size.…”
mentioning
confidence: 75%
See 3 more Smart Citations
“…Results are reported in Table 4. 4. We can see that, reducing N 0 , the number of full function/gradient evaluations reduces as well, and that for N 0 = 0.01N the average classification error compares well with the error when N 0 = 0.1N ; for instance, the best results for cina0 and covertype are obtained by shrinking N 0 to 1% of the maximum sample size.…”
mentioning
confidence: 75%
“…They do not call for function evaluations but require tuning the learning rate and further possible hyper-parameters such as the mini-batch size. Since the tuning effort may be very computationally demanding [15], more sophisticated approaches use linesearch or trust-region strategies to adaptively choose the learning rate and to avoid tuning efforts, see [2,4,5,9,14,15,25]. In this context, function and gradient approximations have to satisfy sufficient accuracy requirements with some probability.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This paper attempts to answer a simple question: how does noise in function values and derivatives affect evaluation complexity of smooth optimization? While analysis has been produced to indicate how high accuracy can be reached by optimization algorithms even in the presence of inexact but deterministic (1) function and derivatives' values (see [8,16,28,3,29,21,14]), these approaches crucially rely on the assumption that the inexactness is controllable, in that it can be made arbitrarily small if required so by the algorithm. But what happens in practical applications where significant noise is intrinsic and can't be assumed away?…”
Section: Introductionmentioning
confidence: 99%