2009
DOI: 10.1007/s10994-009-5142-6
|View full text |Cite
|
Sign up to set email alerts
|

Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Abstract: It has been established that the second-order stochastic gradient descent (SGD) method can potentially achieve generalization performance as well as empirical optimum in a single pass through the training examples. However, second-order SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive for structured prediction problems that usually involve a very high dimensional feature space. This paper presents a new second-order SGD method, called Periodic Step… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
21
0

Year Published

2009
2009
2013
2013

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(21 citation statements)
references
References 31 publications
0
21
0
Order By: Relevance
“…In many cases, therefore, efficient optimization remains an open problem in structured prediction. Two papers in this special issue (Hsu et al 2009;Sutton and McCallum 2009) address this problem. An important source of complexity in structured prediction algorithms is in the iterative nature of the training step: Often, training is done EM-style, where model parameters are estimated at each step, and then inference is performed based on these parameters.…”
Section: Overview Of This Special Issuementioning
confidence: 99%
See 1 more Smart Citation
“…In many cases, therefore, efficient optimization remains an open problem in structured prediction. Two papers in this special issue (Hsu et al 2009;Sutton and McCallum 2009) address this problem. An important source of complexity in structured prediction algorithms is in the iterative nature of the training step: Often, training is done EM-style, where model parameters are estimated at each step, and then inference is performed based on these parameters.…”
Section: Overview Of This Special Issuementioning
confidence: 99%
“…Unfortunately, computing the Hessian for loss functions on structured objects involves correlations between labels in different cliques, which is prohibitively expensive in most cases. The paper, "Periodic step-size adaptation in second-order gradient descent for single-pass online structured learning" by Hsu et al (2009) proposes a second-order stochastic gradient descent method in an online setting. The method approximates the Hessian by exploring a linear relation between the Hessian and the Jacobian such that the computation can be performed very efficiently for each online update.…”
Section: Overview Of This Special Issuementioning
confidence: 99%
“…For converting a conventional batch CRF to an on-line one, several methods have been successfully applied. 8,22,27 In general, the stochastic method used in NNs has been applied to conventional batch CRFs in order to convert them to on-line ones. One type is the Stochastic Gradient Descent (SGD), which has been most successfully applied to CRFs.…”
Section: Introductionmentioning
confidence: 99%
“…The learning algorithms for batch CRFs are generally based on second-order information requiring computation of the inversion of the Hessian. 8,27 In order to reduce the computation time of the inversion of the Hessian in on-line learning algorithms for CRFs, the Hessian-vector product, 17,27 Quasi-Newton SGD (SGD-QN), 1 Stochastic Meta-Descent (SMD) 27 and Componentwise Triple Jump method for Penalized generalized Iterative Scaling (CTJPIS) 8 were applied.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation