Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Hsu, Chun‐Nan; Huang, Han-Shen; Chang, Yang-Lang; Lee, Yuh-Jye

doi:10.1007/s10994-009-5142-6

Cited by 7 publications

(21 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In many cases, therefore, efficient optimization remains an open problem in structured prediction. Two papers in this special issue (Hsu et al 2009;Sutton and McCallum 2009) address this problem. An important source of complexity in structured prediction algorithms is in the iterative nature of the training step: Often, training is done EM-style, where model parameters are estimated at each step, and then inference is performed based on these parameters.…”

Section: Overview Of This Special Issuementioning

confidence: 99%

“…Unfortunately, computing the Hessian for loss functions on structured objects involves correlations between labels in different cliques, which is prohibitively expensive in most cases. The paper, "Periodic step-size adaptation in second-order gradient descent for single-pass online structured learning" by Hsu et al (2009) proposes a second-order stochastic gradient descent method in an online setting. The method approximates the Hessian by exploring a linear relation between the Hessian and the Jacobian such that the computation can be performed very efficiently for each online update.…”

Section: Overview Of This Special Issuementioning

confidence: 99%

See 1 more Smart Citation

Guest editorial: special issue on structured prediction

2009

View full text Add to dashboard Cite

Overview of this special issueStructured Prediction or Structured Classification (Bakir et al. 2007) is the task of predicting a collection of related variables given some input. The relationship between the variables to be predicted is often complex. An example of such complex dependencies is machine translation, where the input is a sequence of words in the source natural language and the output is a sequence of words in the target natural language. Here, each word in the target language relates not only to the words in the source language, but also to the other (arbitrarily far) words in the target sequence.As a field, structured prediction has some unique challenges, several of which are addressed by the papers in this issue. One of the most obvious of these is that the output spaces in question are often exponential in size, and so the complexity, both of these spaces and the learned models, can result in computationally infeasible learning and inference algorithms. In many cases, therefore, efficient optimization remains an open problem in structured prediction. Two papers in this special issue (Hsu et al. 2009;Sutton and McCallum 2009) address this problem. An important source of complexity in structured prediction algorithms is in the iterative nature of the training step: Often, training is done EM-style, where model parameters are estimated at each step, and then inference is performed based on these parameters. In models with complex structure, this inference step C. Parker ( )

show abstract

Section: Overview Of This Special Issuementioning

confidence: 99%

Section: Overview Of This Special Issuementioning

confidence: 99%

Guest editorial: special issue on structured prediction

2009

View full text Add to dashboard Cite

show abstract

“…For converting a conventional batch CRF to an on-line one, several methods have been successfully applied. 8,22,27 In general, the stochastic method used in NNs has been applied to conventional batch CRFs in order to convert them to on-line ones. One type is the Stochastic Gradient Descent (SGD), which has been most successfully applied to CRFs.…”

Section: Introductionmentioning

confidence: 99%

“…The learning algorithms for batch CRFs are generally based on second-order information requiring computation of the inversion of the Hessian. 8,27 In order to reduce the computation time of the inversion of the Hessian in on-line learning algorithms for CRFs, the Hessian-vector product, 17,27 Quasi-Newton SGD (SGD-QN), 1 Stochastic Meta-Descent (SMD) 27 and Componentwise Triple Jump method for Penalized generalized Iterative Scaling (CTJPIS) 8 were applied.…”

Section: Introductionmentioning

confidence: 99%